Key takeaways:
- AI cost tracking should start at the request level.
- A complete AI cost model needs to connect model usage, retrieval, orchestration, cloud infrastructure, on-prem infrastructure, SaaS tools, and shared services into one normalized ledger.
- Shared infrastructure allocation requires usage signals: tokens, requests, compute time, query volume, and workflow steps.
- Cost-per-inference, cost-per-workflow, and cost-to-serve are stronger operating metrics than total AI spend because they connect usage to pricing, margin, and product decisions.
- Start with one production workflow, define the cost schema, allocate shared costs, set guardrails, then expand.
Best For: Finance leaders & FP&A | FinOps Practitioners | IT & Engineering leaders | MSP and cloud partners
From AI Cost Visibility to Cost Tracking Systems
AI cost visibility shows where spend is appearing across models, infrastructure, and tools. If you want a breakdown of those cost drivers, our guide to AI cost visibility covers that ground. This guide focuses on the next step: how to track AI costs at the level of requests, workflows, features, and customers.
The surrounding AI market is expanding quickly. Gartner forecasts worldwide AI spending will reach nearly $1.5 trillion in 2025 and top $2 trillion in 2026, with AI infrastructure and AI-optimized hardware remaining major areas of investment. As AI becomes part of product delivery and shared infrastructure, invoice-level tracking is no longer enough. Teams need to know which workflows, features, customers, and environments are creating the cost.
What Tracking AI Costs Means in Production
Tracking AI costs means building a financial trail for every AI-powered outcome: why that cost happened, who or what generated it, and whether the spend supports the product economics you expected.
Sundeep Goel, Mavvrik’s CEO, frames the challenge plainly: “Many companies can’t tell you what they’re spending on AI, let alone which teams drove it, which products it’s tied to, or how it’s affecting their margins. That used to be tolerable. It isn’t anymore.”
Getting there requires three connected layers:
| Tracking Layer | What it Answers | Example Fields |
|---|---|---|
| Usage layer | What happened inside the AI system? | provider, model, input tokens, output tokens, embeddings, retrieval count, tool calls, latency, retry count |
| Ownership layer | Who or what should own the cost? | team, cost center, product, feature, customer, tenant, environment, workflow, agent |
| Finance layer | How should the cost be allocated and controlled? | allocation rule, owner, budget, forecast, cost-to-serve, margin impact |
Unlike SaaS tracking, which often starts with seats and renewals, AI cost tracking starts with usage events and expands outward into the infrastructure, data systems, and workflows that support them.
What an AI Cost Tracking System Follows
AI cost tracking starts with the full execution path of a request, from first call to final output.
One workflow can touch a model call, a vector database retrieval, an orchestration layer, GPU or CPU resources, and one or more retries before it produces an output. Each step generates a cost signal. Those signals land in different places, under different line items, with no automatic connection to the workflow that triggered them. Until they are linked, a business cannot determine what it costs to serve a customer, run a feature, or complete an inference.
AI Cost Tracking System
From Usage Signals to Business-Level Cost Insights
Model Usage
- Tokens (input / output)
- Requests & retries
- Errors & timeouts
- Model routing
Data Systems
- Retrieval queries
- Embedding volume
- Vector DB storage
- Data pipeline jobs
Infrastructure
- GPU seconds
- CPU & memory usage
- K8s pod runtime
- Network / ingress *
Workflow & Orchestration
- Workflow runs
- Steps / tool calls
- Execution duration
- Agent invocations
* Network / ingress costs are ingested via cloud billing reports (AWS CUR, Azure Cost Management, GCP Billing Export) — not as a separate direct connector.
Usage Normalization
Standardize units — tokens, GB, seconds, requests — across providers and environments using a consistent schema aligned with FOCUS.
Cost Rate Cards
Apply dynamic rate cards for cloud, models, data services, and networking. Reconcile calculated cost against actual provider billing.
Allocation Rules
Distribute shared costs using documented signals — per token, per GPU second, per query, per workflow run. Direct, proportional, or overhead.
Cost per Inference
Fully-loaded cost of one model-backed response or decision
Cost per Workflow
End-to-end cost for each workflow or agent execution
Cost per Feature
Attribute cost to specific product features and capabilities
Cost per Customer / Tenant
Allocate cost across customers or tenants fairly and accurately
Cost-to-Serve
True cost to serve each user, customer, or product line
Figure 1: AI Cost Tracking System. Usage signals from model, data, infrastructure, and orchestration layers flow through allocation logic to produce cost-per-inference, cost-per-workflow, cost-per-feature, cost-per-customer, and cost-to-serve outputs.
Falling token prices will not remove the need for tracking. Gartner predicts that by 2030, inference on a 1 trillion-parameter LLM will cost GenAI providers over 90% less than in 2025, but it also notes that agentic models can require 5 to 30 times more tokens per task than a standard GenAI chatbot. The unit cost may fall, but workflow complexity can still push total cost higher. That is why tracking has to follow the full execution path outside of the model call.
Why Tagging Helps and Where It Stops
Providers like Vertex AI and Bedrock support labels and cost allocation tags, and applying them consistently is a sound practice. Tags let you group spend by team or workload, which gives you a starting point for accountability.
The limitation is that tags cannot distribute shared costs across features, tenants, or workflows that pull from the same infrastructure. For that, you need usage-based allocation built on signals like tokens, requests, query volume, workflow runs, or compute time. Tags tell you who the resource belongs to, while allocation tells you how much each consumer spent.
Standardize the Schema Before Building the Dashboard
Provider exports still vary enough that teams need a shared data model before reporting. Pulling those exports together without a consistent schema produces a dashboard full of numbers that look comparable but are not.
Standardization is becoming more important as FinOps expands beyond cloud. The FinOps Foundation’s 2026 State of FinOps report says AI cost management is the top skillset teams need to develop, and 98% of respondents now manage AI spend, up from 31% two years ago. The same report points to growing adoption of FOCUS as practitioners look for consistent cost and usage data across a more complex technology landscape.
AI cost tracking needs the same discipline at the application layer. Provider, model, request, owner, infrastructure, and business outcome data need to line up before dashboards can produce trusted answers.
Minimum Viable AI Cost Schema
| Field Category | Recommended Fields | Why It Matters |
|---|---|---|
| Provider and model | provider, model, model_version, service_tier, region | Separates vendor movement, routing changes, and pricing differences |
| Usage | input_tokens, output_tokens, cached_tokens, embedding_count, request_count, query_count | Explains cost movement without relying on totals alone |
| Workflow context | workflow_id, run_id, session_id, agent_name, step_name, retry_count, request_status | Makes loops, fallbacks, and failure paths visible |
| Business context | product, feature_id, customer_id, tenant_id, plan_tier, environment | Connects usage to pricing, margins, and ownership |
| Infrastructure context | cloud_provider, account, cluster, namespace, workload, GPU_type, compute_seconds | Brings GPU, Kubernetes, cloud, and on-prem costs into the same attribution model |
| Finance context | allocation_rule, owner, budget, forecast_category | Converts usage into showback, chargeback, and cost-to-serve |
Key Insight: Do not start by trying to track every workflow in the company. Start with one production workflow where volume is steady and business value is clear. The schema can expand once the first workflow produces trusted numbers.
How to Track AI Costs in Practice
The practical system has five parts. Each part can start narrow, but each should be designed to scale across models, teams, and environments.
Step 1: Route AI Calls Through an Instrumentation Layer
A production AI request should pass through a logging proxy, gateway, middleware layer, or SDK that captures metadata before and after the model call.
At minimum, capture:
| Metadata | Example |
|---|---|
| provider | openai, azure_openai, bedrock, vertex, anthropic |
| model | model name returned by the provider |
| feature_id | support_summary, smart_search, contract_review |
| workflow_id | renewal_risk_scoring_v2 |
| run_id | shared identifier across all steps in one workflow |
| agent_name | research_agent, routing_agent, collections_agent |
| step_name | retrieve_docs, classify_ticket, generate_response |
| tenant_id or customer_id | internal stable ID, not sensitive personal information |
| input_tokens | provider reported value where available |
| output_tokens | provider reported value where available |
| status | success, retry, timeout, failure |
| cost_estimate | calculated from current rate table or reconciled later |
Without a logging layer, cost questions stay broad and hard to answer. With one, they get specific enough to act on. Finance can trace a usage spike to the workflow that caused it, product can see which feature is getting more expensive per completed action, and engineering can find which step is retrying and fix it.
Step 2: Reconcile Provider Cost with Internal Telemetry
Internal telemetry tells you what happened inside a workflow. Provider billing tells you what that activity costs. Cost tracking requires both, read together against the same execution path.
| Reconciliation Step | What to Check |
|---|---|
| Pull provider usage and cost | Use provider dashboards, APIs, billing exports, or cloud cost reports |
| Join to internal metadata | Match by project, key, time bucket, model, request ID, gateway ID, or trace ID where available |
| Compare calculated cost to billed cost | Identify missing requests, stale price tables, cache discounts, batch discounts, or untagged usage |
| Assign confidence level | High confidence for matched request-level data, lower confidence for allocated shared cost |
| Publish to the cost ledger | Store the final record with cost owner, allocation rule, and source evidence |
Provider totals confirm what you owe, while internal telemetry explains why.
Step 3: Allocate Shared Costs with Explicit Rules
Shared AI infrastructure is where allocation gets tricky. A shared GPU pool may serve multiple models. A vector database may support several product features. A workflow orchestrator may coordinate multiple agents. A data platform may support both AI and non-AI workloads.
For each shared cost, define the allocation signal before reporting begins.
| Shared Cost Type | Recommended Allocation Signal |
|---|---|
| Shared inference endpoint | completed request count, token volume, or compute time |
| Shared GPU cluster | GPU seconds, pod runtime, namespace, workload, or queue time |
| Vector database | query count, storage volume, index size, or tenant-specific collections |
| Embedding pipeline | documents processed, embeddings generated, or re-indexing jobs |
| Orchestration layer | workflow runs, step count, tool calls, or duration |
| Observability and logging | event volume, span count, or proportional allocation across AI workloads |
There is no perfect allocation rule for shared infrastructure. But you need a documented rule, an owner, and a review cadence. Otherwise, cost conversations turn into a debate about methodology instead of an operating process.
Step 4: Define the Unit Metric That Connects Cost to Value
Total AI spend is not enough for product or finance decisions. The useful metric is tied to the unit of value your AI system delivers.
| Business model | Strong unit metric |
|---|---|
| AI-enabled SaaS feature | cost per feature use |
| Agent workflow | cost per agent run or cost per completed workflow |
| Customer support AI | cost per resolution |
| Document automation | cost per processed document |
| Payments or transaction workflows | cost per transaction |
| MSP or partner environment | cost-to-serve per tenant or customer |
| Internal developer tools | cost per active developer, task, or accepted output |
Case Study: A B2B fintech company working with Mavvrik connected cloud, SaaS, and AI usage to product economics and identified cost per payment at $0.08. That cost-to-serve model supported pricing, margin management, and infrastructure accountability, contributing to $1.75 million in savings over 20 months.
Step 5: Build Controls Where Spend Happens
AI cost controls should sit close to the request path. A month-end budget review is too late for a runaway workflow.
Start with five practical controls:
| Control | Trigger | Action |
|---|---|---|
| Cost-per-workflow spike | Cost per completed workflow rises above its rolling baseline | Review recent runs, compare model choice, prompt size, retrieval count, and retry behavior |
| Retry loop detection | Same run_id appears repeatedly within a short window | Stop the workflow, alert the owner, and mark the run for review |
| Token budget pacing | Usage reaches an agreed share of the period budget early | Reroute non-critical workloads, reduce context length, or require approval for high-cost paths |
| Per-tenant anomaly | One tenant’s cost-to-serve rises outside expected range | Check for usage changes, abuse, misconfiguration, or feature adoption shifts |
| Model routing exception | A workflow uses a higher-cost model outside policy | Route to approved model tier or require justification |
The goal is to place financial controls at the point where decisions are made: model selection, retrieval depth, retry policy, context size, and routing.
Common Blockers and How to Handle Them
Blocker 1: One API Key Serves Too Many Use Cases
A single shared API key makes setup fast and attribution nearly impossible. When multiple product surfaces route through the same key, you know what was spent but not why, by which feature, or for which customer.
The fix is separation at the right boundary. Moving to project-, key-, workspace-, profile-, or gateway-level separation for major product surfaces gives you a cleaner starting point. From there, carry feature, tenant, and workflow metadata in internal logs. Provider-level separation alone is not enough if several features share the same boundary.
Blocker 2: Shared GPU Costs Do Not Map to Inference
Cloud billing surfaces accelerator spend. On-prem infrastructure requires modeling depreciation, utilization, power, and support costs. Neither translates automatically into cost-per-inference, which is the unit that holds value for pricing and margin decisions.
The fix is building a rate card from infrastructure cost. Calculate GPU cost per hour or per second, then allocate by workload runtime, GPU utilization, namespace, queue, model, or serving endpoint. Once infrastructure has a rate, you can attach it to what it produced.
Blocker 3: Retry Behavior Hides Inside Normal Usage
Retries generate tokens and requests just like successful calls do. Without run-level identifiers, a broken workflow and a healthy one look identical in the data. A workflow that succeeds after repeated failures can be misread as adoption growth.
The fix is making run_id, step_name, request_status, and retry_count mandatory fields. A successful first attempt and a workflow that succeeds after repeated attempts should not look the same in the ledger.
Build vs Buy AI Cost Tracking Systems
Building an AI cost tracking system internally can make sense when the environment is narrow, the number of providers is limited, and attribution requirements are simple. The tradeoff changes once the system has to cover LLM APIs, GPUs, Kubernetes, SaaS tools, cloud billing, on-prem infrastructure, and agent workflows.
The question is whether the internal system can stay accurate as providers change pricing, schemas drift, APIs change, and new AI services enter the stack. For teams evaluating whether they should build vs buy for AI cost management, the decision often comes down to time to value, integration coverage, and ongoing maintenance. Internal builds can work, but financial control is needed while AI usage is scaling.
How Mavvrik Approaches AI Cost Tracking
Mavvrik treats AI cost tracking as full-stack financial attribution across cloud, on-prem, GPU, SaaS, LLM, and agentic workloads. The platform normalizes cost signals across providers and environments, carries business context such as team, service, tenant, user, workflow, and custom tags, and supports cost-to-serve views by product, customer, feature, or workflow.
For agentic systems, Mavvrik’s Agentic Cost Intelligence SDK captures LLM calls, tool actions, timing, token usage, and workflow steps through OpenTelemetry-native events. That makes it possible to tie cost to the step that generated it rather than leaving it inside aggregate spend.
Three ways to continue from here:
- See how Mavvrik supports this in practice. Mavvrik brings these cost signals together into a single system of record, with workload-level tracking and cost-to-serve visibility across your entire infrastructure. You can explore this through a product walkthrough or connect with the team to see how it applies to your environment.
- Get a clear view of your current AI cost structure. Start by understanding where your AI spend is coming from across cloud, GPUs, SaaS tools, and model providers. The goal is to see how costs map to workflows, features, and customers rather than isolated systems.
- Evaluate how your costs connect to business outcomes. Look at whether you can measure cost-to-serve for a product, feature, or customer. If that connection is missing, it becomes difficult to support pricing, forecasting, and margin decisions.
FAQs
How do you track AI costs in 2026?
Track AI costs by capturing request-level usage, joining it with provider billing data, and allocating shared infrastructure costs to a business unit of value. A practical system includes provider usage, model usage, workflow metadata, customer or tenant context, infrastructure cost, and allocation rules in one ledger.
Why are provider billing dashboards not enough for AI cost tracking?
Provider dashboards show what a provider billed, but they usually do not explain which workflow, feature, customer, or tenant generated the cost. They also do not cover related spend from retrieval, data platforms, orchestration, cloud infrastructure, on-prem GPUs, and shared services.
What is cost-per-inference?
Cost-per-inference is the total cost required to produce one model-backed response or decision. A useful calculation includes model cost, retrieval cost, orchestration cost, allocated infrastructure cost, and any shared overhead assigned to the same inference unit.
What metadata should you capture on every AI request?
Capture provider, model, input tokens, output tokens, workflow ID, run ID, step name, status, retry count, feature ID, environment, team, customer, tenant, and cost estimate. Add infrastructure context when the request uses GPUs, Kubernetes, on-prem systems, or shared serving layers.
When should a company build versus buy AI cost tracking?
Building can work when the environment is narrow, provider count is low, and attribution requirements are simple. Buying becomes more practical when the organization needs coverage across cloud, on-prem, GPUs, SaaS, LLM APIs, Kubernetes, agent workflows, multi-tenant reporting, chargeback, and cost-to-serve.

