Skip to main content

Token Usage & Cost Tracking

AxonFlow tracks token and cost information so teams can understand how governed AI workloads behave in practice, not just in notebooks and demos.

For most engineering leaders, token visibility stops being a nice-to-have as soon as one application turns into a shared internal platform. Cost, usage, and workflow shape become operating concerns, not just experimentation data.

Where Token Tracking Shows Up

In the current community stack, the most visible token and cost metrics are exposed for gateway-mode audit flows and displayed in the bundled Grafana dashboard.

Important metrics:

MetricDescription
axonflow_gateway_llm_tokens_totalTotal tokens reported through gateway audit flows
axonflow_gateway_llm_cost_usd_totalEstimated USD cost reported through gateway audit flows
axonflow_orchestrator_llm_calls_totalOrchestrator-side provider call volume

Prometheus Access

Use the native Prometheus endpoint:

curl -s http://localhost:8080/prometheus | grep axonflow_gateway_llm
curl -s http://localhost:8081/prometheus | grep axonflow_orchestrator_llm_calls_total

Prometheus should scrape /prometheus, not /metrics.

How Token Tracking Works

Token tracking happens at the agent gateway level during audit flows. When an LLM call passes through the gateway, the response from the provider includes token usage data (prompt tokens, completion tokens, total tokens). The agent extracts these values and does two things with them:

  1. Increments Prometheus counters so token and cost data is available for real-time dashboards and alerting.
  2. Persists the values in the audit record so token data is available for historical analysis and cost attribution.

Cost estimation uses per-model pricing tables configured in the platform. The estimated_cost_usd field is computed from token counts and the model's configured cost-per-1k-tokens rates. Enterprise users can set custom per-provider cost rates through the portal's LLM provider configuration, where cost_per_1k_input_tokens and cost_per_1k_output_tokens are configurable per provider.

Because the cost fields are estimates, teams should treat them as operational guidance and budget signals rather than as a replacement for the provider invoice. The most useful pattern is to compare AxonFlow runtime estimates with monthly provider billing and then tune routing, budgets, or approval thresholds from there.

Useful Queries

Token growth over time

sum(increase(axonflow_gateway_llm_tokens_total[1h]))

Cost over time

sum(increase(axonflow_gateway_llm_cost_usd_total[24h]))

Cost by provider and model

sum by (provider, model) (increase(axonflow_gateway_llm_cost_usd_total[24h]))

Request rate by provider

sum by (provider) (rate(axonflow_orchestrator_llm_calls_total[5m]))

Average tokens per request

sum(rate(axonflow_gateway_llm_tokens_total[5m])) / sum(rate(axonflow_orchestrator_llm_calls_total[5m]))

Audit Storage

Gateway audit flows also persist token fields such as:

  • prompt_tokens
  • completion_tokens
  • total_tokens
  • estimated_cost_usd

That makes AxonFlow useful not just for runtime control, but for explaining usage and spend after the fact.

What to Use This For

Senior engineering teams usually care about token tracking for three reasons:

  1. capacity and cost planning
  2. policy tuning by workload shape
  3. proving that routed or governed AI traffic is visible enough for production operations

Per-Step Cost Tracking in Workflows

When using workflow orchestration (MAP or WCP), token and cost tracking applies at each step. Every step that makes an LLM call records its own prompt_tokens, completion_tokens, and estimated_cost_usd in the step audit record. This means you can analyze cost not just at the workflow level, but at the individual step level, which is important for understanding where token spend concentrates in multi-step workflows.

The orchestrator also tracks total axonflow_orchestrator_llm_calls_total as a counter, which lets you correlate provider call volume with per-provider cost data from the gateway metrics.

Community vs Higher Tiers

Community is enough to validate instrumentation and workflow economics. As usage scales across teams and business units, Evaluation and Enterprise become more compelling because the governance, retention, and broader operational story matter more. Enterprise adds per-provider cost rate configuration through the portal, tenant-scoped cost breakdowns, and budget enforcement capabilities.

Next Steps