Token Usage & Cost Tracking
AxonFlow tracks token and cost information so teams can understand how governed AI workloads behave in practice, not just in notebooks and demos.
For most engineering leaders, token visibility stops being a nice-to-have as soon as one application turns into a shared internal platform. Cost, usage, and workflow shape become operating concerns, not just experimentation data.
Where Token Tracking Shows Up
In the current community stack, the most visible token and cost metrics are exposed for gateway-mode audit flows and displayed in the bundled Grafana dashboard.
Important metrics:
| Metric | Description |
|---|---|
axonflow_gateway_llm_tokens_total | Total tokens reported through gateway audit flows |
axonflow_gateway_llm_cost_usd_total | Estimated USD cost reported through gateway audit flows |
axonflow_orchestrator_llm_calls_total | Orchestrator-side provider call volume |
Prometheus Access
Use the native Prometheus endpoint:
curl -s http://localhost:8080/prometheus | grep axonflow_gateway_llm
curl -s http://localhost:8081/prometheus | grep axonflow_orchestrator_llm_calls_total
Prometheus should scrape /prometheus, not /metrics.
How Token Tracking Works
Token tracking happens at the agent gateway level during audit flows. When an LLM call passes through the gateway, the response from the provider includes token usage data (prompt tokens, completion tokens, total tokens). The agent extracts these values and does two things with them:
- Increments Prometheus counters so token and cost data is available for real-time dashboards and alerting.
- Persists the values in the audit record so token data is available for historical analysis and cost attribution.
Cost estimation uses per-model pricing tables configured in the platform. The estimated_cost_usd field is computed from token counts and the model's configured cost-per-1k-tokens rates. Enterprise users can set custom per-provider cost rates through the portal's LLM provider configuration, where cost_per_1k_input_tokens and cost_per_1k_output_tokens are configurable per provider.
Because the cost fields are estimates, teams should treat them as operational guidance and budget signals rather than as a replacement for the provider invoice. The most useful pattern is to compare AxonFlow runtime estimates with monthly provider billing and then tune routing, budgets, or approval thresholds from there.
Useful Queries
Token growth over time
sum(increase(axonflow_gateway_llm_tokens_total[1h]))
Cost over time
sum(increase(axonflow_gateway_llm_cost_usd_total[24h]))
Cost by provider and model
sum by (provider, model) (increase(axonflow_gateway_llm_cost_usd_total[24h]))
Request rate by provider
sum by (provider) (rate(axonflow_orchestrator_llm_calls_total[5m]))
Average tokens per request
sum(rate(axonflow_gateway_llm_tokens_total[5m])) / sum(rate(axonflow_orchestrator_llm_calls_total[5m]))
Audit Storage
Gateway audit flows also persist token fields such as:
prompt_tokenscompletion_tokenstotal_tokensestimated_cost_usd
That makes AxonFlow useful not just for runtime control, but for explaining usage and spend after the fact.
What to Use This For
Senior engineering teams usually care about token tracking for three reasons:
- capacity and cost planning
- policy tuning by workload shape
- proving that routed or governed AI traffic is visible enough for production operations
Per-Step Cost Tracking in Workflows
When using workflow orchestration (MAP or WCP), token and cost tracking applies at each step. Every step that makes an LLM call records its own prompt_tokens, completion_tokens, and estimated_cost_usd in the step audit record. This means you can analyze cost not just at the workflow level, but at the individual step level, which is important for understanding where token spend concentrates in multi-step workflows.
The orchestrator also tracks total axonflow_orchestrator_llm_calls_total as a counter, which lets you correlate provider call volume with per-provider cost data from the gateway metrics.
Community vs Higher Tiers
Community is enough to validate instrumentation and workflow economics. As usage scales across teams and business units, Evaluation and Enterprise become more compelling because the governance, retention, and broader operational story matter more. Enterprise adds per-provider cost rate configuration through the portal, tenant-scoped cost breakdowns, and budget enforcement capabilities.
