AI

How to Track and Cap AI Spending per Team with Amazon Bedrock

AI platform teams need governance before scaling. Learn how to use Amazon Bedrock inference profiles, AWS Budgets, and a proactive cost control pattern to track, allocate, and cap AI spending per team.

Alexandre Agius

Alexandre Agius

AWS Solutions Architect

7 min read
Share:

You’ve built an AI platform. Multiple teams are calling foundation models through Amazon Bedrock. Usage is growing. Then someone asks: “How much is the marketing team spending on AI this month?” — and you don’t have an answer.

This is the governance gap that hits every organization scaling AI adoption. Without visibility into who’s spending what, you can’t do chargebacks, set budgets, or prevent cost surprises.

This post walks through a practical approach to solving this with Amazon Bedrock inference profiles — from basic cost tracking to proactive hard caps per team.

The Problem

When multiple teams share a single Bedrock setup, all model invocations look the same on your AWS bill. You can’t tell:

  • Which team is driving costs
  • Which use case is consuming the most tokens
  • Whether a team has exceeded their allocated budget
  • If there’s an anomalous spike in a specific application

Without this visibility, scaling AI adoption is a financial risk.

The Solution: Inference Profiles as a Governance Layer

Amazon Bedrock Application Inference Profiles are the answer. Think of them as a thin wrapper around a foundation model. Instead of calling the model directly, each team calls through their own inference profile. Same model underneath — but now you have a control point.

Bedrock Inference Profiles - Multi-Team Governance

How It Works

Each team gets their own Application Inference Profile, tagged with cost allocation metadata:

Team: AI Core       → Inference Profile "team-ai-core"       → Claude Sonnet
                      tags: team=core, budget=10000

Team: AI Extended   → Inference Profile "team-ai-extended"   → Claude Sonnet
                      tags: team=extended, budget=5000

Team: Business      → Inference Profile "team-business"      → Claude Haiku
                      tags: team=business, budget=2000

The model behavior is identical. What changes is visibility and control.

What This Gives You

Governance NeedHow Inference Profiles Help
Cost trackingTag profiles per team/app → costs visible in Cost Explorer and CUR
Cost allocationChargeback to business units based on actual usage per profile
Budget alertsAWS Budgets per tag → alerts when a team overspends
Anomaly detectionAWS Cost Anomaly Detection catches unexpected spikes per profile
Access controlIAM policies restrict who can use which inference profile
Usage monitoringCloudWatch metrics per profile → dashboards for tokens, requests, latency

Step 1: Create Inference Profiles with Tags

Create an Application Inference Profile for each team using the AWS CLI or SDK:

import boto3

bedrock = boto3.client('bedrock')

response = bedrock.create_inference_profile(
    inferenceProfileName='team-ai-core',
    modelSource={
        'copyFrom': 'arn:aws:bedrock:eu-west-1::foundation-model/anthropic.claude-3-5-sonnet-20241022-v2:0'
    },
    tags=[
        {'key': 'team', 'value': 'ai-core'},
        {'key': 'environment', 'value': 'prod'},
        {'key': 'cost-center', 'value': 'AI-001'},
        {'key': 'monthly-budget', 'value': '10000'}
    ]
)

Activate these tags as Cost Allocation Tags in your billing console so they appear in Cost Explorer and the Cost and Usage Report (CUR).

Step 2: Lock Down Access with IAM

Use IAM policies to ensure each team can only invoke their own inference profile:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": "arn:aws:bedrock:eu-west-1:123456789012:inference-profile/team-ai-core"
    }
  ]
}

The AI Core team can only invoke through their profile. They can’t use other models or access other teams’ profiles. This is your access control layer.

Step 3: Set Up Alerting (Native — No Code Required)

With tagged inference profiles, you get native cost alerting:

AWS Budgets:

  • Create a budget filtered by tag team=ai-core
  • Set monthly threshold: $10,000
  • Configure alerts at 50%, 80%, and 100%
  • Notifications via email or SNS

AWS Cost Anomaly Detection:

  • Monitors spending patterns per cost allocation tag
  • Automatically detects unusual spikes
  • No threshold configuration needed — it learns normal patterns

CloudWatch Alarms:

  • Alert on token usage, request count, or invocation metrics per profile
  • Useful for operational monitoring beyond just cost

These three give you visibility and warnings. But they don’t stop the spending.

Step 4: Proactive Hard Caps (The Missing Piece)

Here’s the reality: Bedrock doesn’t have a native “monthly spend cap per inference profile” toggle. AWS Budgets can trigger account-wide actions, but not per-profile granular blocking.

To actually block requests when a team exceeds their budget, you need a thin proxy layer:

Proactive AI Cost Control Architecture

The Pattern

  1. Application sends request to API Gateway (not directly to Bedrock)
  2. Lambda checks the team’s running total in DynamoDB
  3. Under budget? Forward the request to Bedrock via the inference profile
  4. Over budget? Return an error: “Monthly budget exceeded for team X”
  5. After response, update the DynamoDB counter with tokens consumed

Key Components

LayerAWS ServicePurpose
Entry pointAPI GatewayRoutes requests, adds team context
Budget checkLambdaReads counter, decides allow/block
StateDynamoDBStores running monthly totals per profile
InferenceBedrock Inference ProfileTagged model invocation
MetricsCloudWatchPer-profile usage metrics
AlertsCloudWatch Alarms + SNSThreshold notifications
ReportingS3 + Athena + Cost ExplorerDetailed usage analytics

AWS published a reference architecture for this exact pattern: Proactive AI Cost Management System for Amazon Bedrock.

Graduated Response

Rather than a hard block, consider a graduated approach:

ThresholdAction
50% of budgetNotification to team lead
80% of budgetWarning + degrade to cheaper model (e.g., Haiku instead of Sonnet)
100% of budgetHard block — requests rejected with budget exceeded message
Emergency overrideAdmin can temporarily raise the limit via DynamoDB update

Step 5: Observability with Model Invocation Logging

Enable Bedrock model invocation logging for full audit trails:

  • S3: Gzipped JSON logs — query with Athena for historical analysis
  • CloudWatch Logs: Real-time streaming — use Logs Insights for live queries
-- Athena: Monthly cost breakdown per team
SELECT
  tag_team,
  model_id,
  COUNT(*) as invocations,
  SUM(input_tokens) as total_input_tokens,
  SUM(output_tokens) as total_output_tokens
FROM bedrock_invocation_logs
WHERE month = '2026-02'
GROUP BY tag_team, model_id
ORDER BY total_input_tokens DESC

Feed this into QuickSight for management dashboards showing cost per team, cost per model, and token consumption trends.

Practical Recommendations

Start simple, iterate:

  1. Week 1: Create inference profiles with tags per team. Activate cost allocation tags.
  2. Week 2: Set up AWS Budgets with SNS alerts at 80% and 100%.
  3. Week 3: Enable model invocation logging to S3. Build an Athena query for monthly reports.
  4. Month 2: If hard caps are needed, implement the Lambda proxy pattern.

Pricing note: Inference profile costs are calculated based on the calling region, not the destination region. This makes budgeting predictable even with cross-region profiles.

Cross-region profiles: System-defined profiles distribute requests across regions for throughput and resilience. If your teams operate across EU regions, these provide automatic failover at no extra cost.

What I Learned

  • Inference profiles are free — they’re a metadata layer, not a compute layer. There’s no cost to creating them.
  • The gap is the hard cap — AWS gives you excellent visibility and alerting natively, but stopping requests at a per-profile level requires custom code.
  • Tags are the foundation — if you don’t tag your inference profiles from day one, retroactive cost allocation is painful.
  • Start with alerts, not blocks — most teams self-correct once they have visibility. Hard caps are for compliance requirements, not everyday governance.

What’s Next

  • AWS may add native per-profile spend caps — watch the Bedrock roadmap
  • Explore Bedrock Guardrails for content governance alongside cost governance
  • Combine inference profiles with provisioned throughput for predictable pricing at scale

Have questions about AI cost governance on AWS? Get in touch or explore more posts on the blog.


Alexandre Agius

Alexandre Agius

AWS Solutions Architect

Passionate about AI & Security. Building scalable cloud solutions and helping organizations leverage AWS services to innovate faster. Specialized in Generative AI, serverless architectures, and security best practices.

Related Posts

Back to Blog