Key takeaways:

AI cost tracking should start at the request level.
A complete AI cost model needs to connect model usage, retrieval, orchestration, cloud infrastructure, on-prem infrastructure, SaaS tools, and shared services into one normalized ledger.
Shared infrastructure allocation requires usage signals: tokens, requests, compute time, query volume, and workflow steps.
Cost-per-inference, cost-per-workflow, and cost-to-serve are stronger operating metrics than total AI spend because they connect usage to pricing, margin, and product decisions.
Start with one production workflow, define the cost schema, allocate shared costs, set guardrails, then expand.

Best For: Finance leaders & FP&A | FinOps Practitioners | IT & Engineering leaders | MSP and cloud partners

From AI Cost Visibility to Cost Tracking Systems

AI cost visibility shows where spend is appearing across models, infrastructure, and tools. If you want a breakdown of those cost drivers, our guide to AI cost visibility covers that ground. This guide focuses on the next step: how to track AI costs at the level of requests, workflows, features, and customers.

The surrounding AI market is expanding quickly. Gartner forecasts worldwide AI spending will reach nearly $1.5 trillion in 2025 and top $2 trillion in 2026, with AI infrastructure and AI-optimized hardware remaining major areas of investment. As AI becomes part of product delivery and shared infrastructure, invoice-level tracking is no longer enough. Teams need to know which workflows, features, customers, and environments are creating the cost.

What Tracking AI Costs Means in Production

Tracking AI costs means building a financial trail for every AI-powered outcome: why that cost happened, who or what generated it, and whether the spend supports the product economics you expected.

Sundeep Goel, Mavvrik’s CEO, frames the challenge plainly: “Many companies can’t tell you what they’re spending on AI, let alone which teams drove it, which products it’s tied to, or how it’s affecting their margins. That used to be tolerable. It isn’t anymore.”

Getting there requires three connected layers:

Tracking Layer	What it Answers	Example Fields
Usage layer	What happened inside the AI system?	provider, model, input tokens, output tokens, embeddings, retrieval count, tool calls, latency, retry count
Ownership layer	Who or what should own the cost?	team, cost center, product, feature, customer, tenant, environment, workflow, agent
Finance layer	How should the cost be allocated and controlled?	allocation rule, owner, budget, forecast, cost-to-serve, margin impact

Unlike SaaS tracking, which often starts with seats and renewals, AI cost tracking starts with usage events and expands outward into the infrastructure, data systems, and workflows that support them.

What an AI Cost Tracking System Follows

AI cost tracking starts with the full execution path of a request, from first call to final output.
One workflow can touch a model call, a vector database retrieval, an orchestration layer, GPU or CPU resources, and one or more retries before it produces an output. Each step generates a cost signal. Those signals land in different places, under different line items, with no automatic connection to the workflow that triggered them. Until they are linked, a business cannot determine what it costs to serve a customer, run a feature, or complete an inference.

AI Cost Tracking System

From Usage Signals to Business-Level Cost Insights

1 Cost Signals — Capture usage across the entire AI stack

Model Usage

Tokens (input / output)
Requests & retries
Errors & timeouts
Model routing

Source: LLM Gateway / Model APIs

Data Systems

Retrieval queries
Embedding volume
Vector DB storage
Data pipeline jobs

Source: Vector DB / Data Platforms

Infrastructure

GPU seconds
CPU & memory usage
K8s pod runtime
Network / ingress *

Source: Cloud / K8s / On-prem Billing

Workflow & Orchestration

Workflow runs
Steps / tool calls
Execution duration
Agent invocations

Source: Orchestration / App layer

* Network / ingress costs are ingested via cloud billing reports (AWS CUR, Azure Cost Management, GCP Billing Export) — not as a separate direct connector.

2 Cost Allocation Engine — Normalize, price, and allocate shared costs

Usage Normalization

Standardize units — tokens, GB, seconds, requests — across providers and environments using a consistent schema aligned with FOCUS.

Cost Rate Cards

Apply dynamic rate cards for cloud, models, data services, and networking. Reconcile calculated cost against actual provider billing.

Allocation Rules

Distribute shared costs using documented signals — per token, per GPU second, per query, per workflow run. Direct, proportional, or overhead.

3 Business Outputs — Accurate, actionable cost metrics for every level of the business

Cost per Inference

Fully-loaded cost of one model-backed response or decision

Cost per Workflow

End-to-end cost for each workflow or agent execution

Cost per Feature

Attribute cost to specific product features and capabilities

Cost per Customer / Tenant

Allocate cost across customers or tenants fairly and accurately

Cost-to-Serve

True cost to serve each user, customer, or product line

‹Feedback loop — optimize usage, pricing, and architecture continuously.›

Figure 1: AI Cost Tracking System. Usage signals from model, data, infrastructure, and orchestration layers flow through allocation logic to produce cost-per-inference, cost-per-workflow, cost-per-feature, cost-per-customer, and cost-to-serve outputs.

Falling token prices will not remove the need for tracking. Gartner predicts that by 2030, inference on a 1 trillion-parameter LLM will cost GenAI providers over 90% less than in 2025, but it also notes that agentic models can require 5 to 30 times more tokens per task than a standard GenAI chatbot. The unit cost may fall, but workflow complexity can still push total cost higher. That is why tracking has to follow the full execution path outside of the model call.

Why Tagging Helps and Where It Stops

Providers like Vertex AI and Bedrock support labels and cost allocation tags, and applying them consistently is a sound practice. Tags let you group spend by team or workload, which gives you a starting point for accountability.

The limitation is that tags cannot distribute shared costs across features, tenants, or workflows that pull from the same infrastructure. For that, you need usage-based allocation built on signals like tokens, requests, query volume, workflow runs, or compute time. Tags tell you who the resource belongs to, while allocation tells you how much each consumer spent.

Standardize the Schema Before Building the Dashboard

Provider exports still vary enough that teams need a shared data model before reporting. Pulling those exports together without a consistent schema produces a dashboard full of numbers that look comparable but are not.

Standardization is becoming more important as FinOps expands beyond cloud. The FinOps Foundation’s 2026 State of FinOps report says AI cost management is the top skillset teams need to develop, and 98% of respondents now manage AI spend, up from 31% two years ago. The same report points to growing adoption of FOCUS as practitioners look for consistent cost and usage data across a more complex technology landscape.

AI cost tracking needs the same discipline at the application layer. Provider, model, request, owner, infrastructure, and business outcome data need to line up before dashboards can produce trusted answers.

Minimum Viable AI Cost Schema

Field Category	Recommended Fields	Why It Matters
Provider and model	provider, model, model_version, service_tier, region	Separates vendor movement, routing changes, and pricing differences
Usage	input_tokens, output_tokens, cached_tokens, embedding_count, request_count, query_count	Explains cost movement without relying on totals alone
Workflow context	workflow_id, run_id, session_id, agent_name, step_name, retry_count, request_status	Makes loops, fallbacks, and failure paths visible
Business context	product, feature_id, customer_id, tenant_id, plan_tier, environment	Connects usage to pricing, margins, and ownership
Infrastructure context	cloud_provider, account, cluster, namespace, workload, GPU_type, compute_seconds	Brings GPU, Kubernetes, cloud, and on-prem costs into the same attribution model
Finance context	allocation_rule, owner, budget, forecast_category	Converts usage into showback, chargeback, and cost-to-serve

Key Insight: Do not start by trying to track every workflow in the company. Start with one production workflow where volume is steady and business value is clear. The schema can expand once the first workflow produces trusted numbers.

How to Track AI Costs in Practice

The practical system has five parts. Each part can start narrow, but each should be designed to scale across models, teams, and environments.

Step 1: Route AI Calls Through an Instrumentation Layer

A production AI request should pass through a logging proxy, gateway, middleware layer, or SDK that captures metadata before and after the model call.

At minimum, capture:

Metadata	Example
provider	openai, azure_openai, bedrock, vertex, anthropic
model	model name returned by the provider
feature_id	support_summary, smart_search, contract_review
workflow_id	renewal_risk_scoring_v2
run_id	shared identifier across all steps in one workflow
agent_name	research_agent, routing_agent, collections_agent
step_name	retrieve_docs, classify_ticket, generate_response
tenant_id or customer_id	internal stable ID, not sensitive personal information
input_tokens	provider reported value where available
output_tokens	provider reported value where available
status	success, retry, timeout, failure
cost_estimate	calculated from current rate table or reconciled later

Without a logging layer, cost questions stay broad and hard to answer. With one, they get specific enough to act on. Finance can trace a usage spike to the workflow that caused it, product can see which feature is getting more expensive per completed action, and engineering can find which step is retrying and fix it.

Step 2: Reconcile Provider Cost with Internal Telemetry

Internal telemetry tells you what happened inside a workflow. Provider billing tells you what that activity costs. Cost tracking requires both, read together against the same execution path.

Reconciliation Step	What to Check
Pull provider usage and cost	Use provider dashboards, APIs, billing exports, or cloud cost reports
Join to internal metadata	Match by project, key, time bucket, model, request ID, gateway ID, or trace ID where available
Compare calculated cost to billed cost	Identify missing requests, stale price tables, cache discounts, batch discounts, or untagged usage
Assign confidence level	High confidence for matched request-level data, lower confidence for allocated shared cost
Publish to the cost ledger	Store the final record with cost owner, allocation rule, and source evidence

Provider totals confirm what you owe, while internal telemetry explains why.

Step 3: Allocate Shared Costs with Explicit Rules

Shared AI infrastructure is where allocation gets tricky. A shared GPU pool may serve multiple models. A vector database may support several product features. A workflow orchestrator may coordinate multiple agents. A data platform may support both AI and non-AI workloads.

For each shared cost, define the allocation signal before reporting begins.

Shared Cost Type	Recommended Allocation Signal
Shared inference endpoint	completed request count, token volume, or compute time
Shared GPU cluster	GPU seconds, pod runtime, namespace, workload, or queue time
Vector database	query count, storage volume, index size, or tenant-specific collections
Embedding pipeline	documents processed, embeddings generated, or re-indexing jobs
Orchestration layer	workflow runs, step count, tool calls, or duration
Observability and logging	event volume, span count, or proportional allocation across AI workloads

There is no perfect allocation rule for shared infrastructure. But you need a documented rule, an owner, and a review cadence. Otherwise, cost conversations turn into a debate about methodology instead of an operating process.

Step 4: Define the Unit Metric That Connects Cost to Value

Total AI spend is not enough for product or finance decisions. The useful metric is tied to the unit of value your AI system delivers.

Business model	Strong unit metric
AI-enabled SaaS feature	cost per feature use
Agent workflow	cost per agent run or cost per completed workflow
Customer support AI	cost per resolution
Document automation	cost per processed document
Payments or transaction workflows	cost per transaction
MSP or partner environment	cost-to-serve per tenant or customer
Internal developer tools	cost per active developer, task, or accepted output

Case Study: A B2B fintech company working with Mavvrik connected cloud, SaaS, and AI usage to product economics and identified cost per payment at $0.08. That cost-to-serve model supported pricing, margin management, and infrastructure accountability, contributing to $1.75 million in savings over 20 months.

Step 5: Build Controls Where Spend Happens

AI cost controls should sit close to the request path. A month-end budget review is too late for a runaway workflow.

Start with five practical controls:

Control	Trigger	Action
Cost-per-workflow spike	Cost per completed workflow rises above its rolling baseline	Review recent runs, compare model choice, prompt size, retrieval count, and retry behavior
Retry loop detection	Same run_id appears repeatedly within a short window	Stop the workflow, alert the owner, and mark the run for review
Token budget pacing	Usage reaches an agreed share of the period budget early	Reroute non-critical workloads, reduce context length, or require approval for high-cost paths
Per-tenant anomaly	One tenant’s cost-to-serve rises outside expected range	Check for usage changes, abuse, misconfiguration, or feature adoption shifts
Model routing exception	A workflow uses a higher-cost model outside policy	Route to approved model tier or require justification

The goal is to place financial controls at the point where decisions are made: model selection, retrieval depth, retry policy, context size, and routing.

Common Blockers and How to Handle Them

Blocker 1: One API Key Serves Too Many Use Cases

A single shared API key makes setup fast and attribution nearly impossible. When multiple product surfaces route through the same key, you know what was spent but not why, by which feature, or for which customer.

The fix is separation at the right boundary. Moving to project-, key-, workspace-, profile-, or gateway-level separation for major product surfaces gives you a cleaner starting point. From there, carry feature, tenant, and workflow metadata in internal logs. Provider-level separation alone is not enough if several features share the same boundary.

Blocker 2: Shared GPU Costs Do Not Map to Inference

Cloud billing surfaces accelerator spend. On-prem infrastructure requires modeling depreciation, utilization, power, and support costs. Neither translates automatically into cost-per-inference, which is the unit that holds value for pricing and margin decisions.

The fix is building a rate card from infrastructure cost. Calculate GPU cost per hour or per second, then allocate by workload runtime, GPU utilization, namespace, queue, model, or serving endpoint. Once infrastructure has a rate, you can attach it to what it produced.

Blocker 3: Retry Behavior Hides Inside Normal Usage

Retries generate tokens and requests just like successful calls do. Without run-level identifiers, a broken workflow and a healthy one look identical in the data. A workflow that succeeds after repeated failures can be misread as adoption growth.

The fix is making run_id, step_name, request_status, and retry_count mandatory fields. A successful first attempt and a workflow that succeeds after repeated attempts should not look the same in the ledger.

Build vs Buy AI Cost Tracking Systems

Building an AI cost tracking system internally can make sense when the environment is narrow, the number of providers is limited, and attribution requirements are simple. The tradeoff changes once the system has to cover LLM APIs, GPUs, Kubernetes, SaaS tools, cloud billing, on-prem infrastructure, and agent workflows.

The question is whether the internal system can stay accurate as providers change pricing, schemas drift, APIs change, and new AI services enter the stack. For teams evaluating whether they should build vs buy for AI cost management, the decision often comes down to time to value, integration coverage, and ongoing maintenance. Internal builds can work, but financial control is needed while AI usage is scaling.

How Mavvrik Approaches AI Cost Tracking

Mavvrik treats AI cost tracking as full-stack financial attribution across cloud, on-prem, GPU, SaaS, LLM, and agentic workloads. The platform normalizes cost signals across providers and environments, carries business context such as team, service, tenant, user, workflow, and custom tags, and supports cost-to-serve views by product, customer, feature, or workflow.

For agentic systems, Mavvrik’s Agentic Cost Intelligence SDK captures LLM calls, tool actions, timing, token usage, and workflow steps through OpenTelemetry-native events. That makes it possible to tie cost to the step that generated it rather than leaving it inside aggregate spend.

Three ways to continue from here:

See how Mavvrik supports this in practice. Mavvrik brings these cost signals together into a single system of record, with workload-level tracking and cost-to-serve visibility across your entire infrastructure. You can explore this through a product walkthrough or connect with the team to see how it applies to your environment.
Get a clear view of your current AI cost structure. Start by understanding where your AI spend is coming from across cloud, GPUs, SaaS tools, and model providers. The goal is to see how costs map to workflows, features, and customers rather than isolated systems.
Evaluate how your costs connect to business outcomes. Look at whether you can measure cost-to-serve for a product, feature, or customer. If that connection is missing, it becomes difficult to support pricing, forecasting, and margin decisions.

FAQs

How do you track AI costs in 2026?

Track AI costs by capturing request-level usage, joining it with provider billing data, and allocating shared infrastructure costs to a business unit of value. A practical system includes provider usage, model usage, workflow metadata, customer or tenant context, infrastructure cost, and allocation rules in one ledger.

Why are provider billing dashboards not enough for AI cost tracking?

Provider dashboards show what a provider billed, but they usually do not explain which workflow, feature, customer, or tenant generated the cost. They also do not cover related spend from retrieval, data platforms, orchestration, cloud infrastructure, on-prem GPUs, and shared services.

What is cost-per-inference?

Cost-per-inference is the total cost required to produce one model-backed response or decision. A useful calculation includes model cost, retrieval cost, orchestration cost, allocated infrastructure cost, and any shared overhead assigned to the same inference unit.

What metadata should you capture on every AI request?

Capture provider, model, input tokens, output tokens, workflow ID, run ID, step name, status, retry count, feature ID, environment, team, customer, tenant, and cost estimate. Add infrastructure context when the request uses GPUs, Kubernetes, on-prem systems, or shared serving layers.

When should a company build versus buy AI cost tracking?

Building can work when the environment is narrow, provider count is low, and attribution requirements are simple. Buying becomes more practical when the organization needs coverage across cloud, on-prem, GPUs, SaaS, LLM APIs, Kubernetes, agent workflows, multi-tenant reporting, chargeback, and cost-to-serve.

Alexa Abbruscato

Certified FinOps Practitioner & Content Strategist

Alexa is a certified FinOps Practitioner and FOCUS Analyst who translates complex concepts into language that resonates across engineering, finance, and procurement.

USE CASE

SUPPORTED PLATFORMS

How to Track AI Costs in 2026: From Usage Logs to Cost-to-Serve

AI Cost Tracking System

Model Usage

Data Systems

Infrastructure

Workflow & Orchestration

Usage Normalization

Cost Rate Cards

Allocation Rules

Cost per Inference

Cost per Workflow

Cost per Feature

Cost per Customer / Tenant

Cost-to-Serve

How do you track AI costs in 2026?

Why are provider billing dashboards not enough for AI cost tracking?

What is cost-per-inference?

What metadata should you capture on every AI request?

When should a company build versus buy AI cost tracking?

Alexa Abbruscato

Certified FinOps Practitioner & Content Strategist

Recent Posts

Best AI Cost Visibility Tools in 2026

FinOps X 2026 Recap: Tokenomics, the Nine Cost Buckets, and the Shift to Technology Value

Mavvrik Feature Release: Cost Attribution for Claude Code & Cowork

USE CASE

SUPPORTED PLATFORMS

How to Track AI Costs in 2026: From Usage Logs to Cost-to-Serve

From AI Cost Visibility to Cost Tracking Systems

What Tracking AI Costs Means in Production

What an AI Cost Tracking System Follows

AI Cost Tracking System

Model Usage

Data Systems

Infrastructure

Workflow & Orchestration

Usage Normalization

Cost Rate Cards

Allocation Rules

Cost per Inference

Cost per Workflow

Cost per Feature

Cost per Customer / Tenant

Cost-to-Serve

Why Tagging Helps and Where It Stops

Standardize the Schema Before Building the Dashboard

Minimum Viable AI Cost Schema

How to Track AI Costs in Practice

Step 1: Route AI Calls Through an Instrumentation Layer

Step 2: Reconcile Provider Cost with Internal Telemetry

Step 3: Allocate Shared Costs with Explicit Rules

Step 4: Define the Unit Metric That Connects Cost to Value

Step 5: Build Controls Where Spend Happens

Common Blockers and How to Handle Them

Blocker 1: One API Key Serves Too Many Use Cases

Blocker 2: Shared GPU Costs Do Not Map to Inference

Blocker 3: Retry Behavior Hides Inside Normal Usage

Build vs Buy AI Cost Tracking Systems

How Mavvrik Approaches AI Cost Tracking

FAQs

How do you track AI costs in 2026?

Why are provider billing dashboards not enough for AI cost tracking?

What is cost-per-inference?

What metadata should you capture on every AI request?

When should a company build versus buy AI cost tracking?

Alexa Abbruscato

Certified FinOps Practitioner & Content Strategist

Subscribe for updates

Recent Posts

Best AI Cost Visibility Tools in 2026

FinOps X 2026 Recap: Tokenomics, the Nine Cost Buckets, and the Shift to Technology Value

Mavvrik Feature Release: Cost Attribution for Claude Code & Cowork