Token Usage & Cost Tracking
AxonFlow automatically tracks token usage for all LLM requests processed through the platform. This guide covers how to access and analyze token metrics for cost management and optimization.
Overview
Every LLM request routed through AxonFlow captures:
| Metric | Description |
|---|---|
prompt_tokens | Number of tokens in the request (input) |
completion_tokens | Number of tokens in the response (output) |
total_tokens | Sum of prompt + completion tokens |
This data is stored in the audit trail and exposed via Prometheus metrics for monitoring and alerting.
Prometheus Metrics
AxonFlow exposes token and cost metrics at the /metrics endpoint on the Orchestrator (port 8081).
Available Metrics
| Metric | Type | Description |
|---|---|---|
axonflow_orchestrator_provider_total_tokens | Counter | Total tokens used across all requests |
axonflow_orchestrator_provider_total_cost | Counter | Estimated cost based on provider pricing |
axonflow_orchestrator_provider_prompt_tokens | Counter | Total prompt (input) tokens |
axonflow_orchestrator_provider_completion_tokens | Counter | Total completion (output) tokens |
Labels
Metrics include labels for filtering:
provider- LLM provider (openai, anthropic, bedrock, ollama)model- Specific model (gpt-4, claude-3-sonnet, etc.)tenant_id- Tenant identifier (multi-tenant deployments)
Example Queries
Total tokens in the last 24 hours:
sum(increase(axonflow_orchestrator_provider_total_tokens[24h]))
Tokens by provider:
sum by (provider) (increase(axonflow_orchestrator_provider_total_tokens[24h]))
Cost by model (last 7 days):
sum by (model) (increase(axonflow_orchestrator_provider_total_cost[7d]))
Token rate (tokens per minute):
rate(axonflow_orchestrator_provider_total_tokens[5m]) * 60
Database Storage
Token usage is persisted in the gateway_mode_audit table for long-term analysis and compliance.
Schema
CREATE TABLE gateway_mode_audit (
id UUID PRIMARY KEY,
tenant_id VARCHAR(255) NOT NULL,
user_id VARCHAR(255),
provider VARCHAR(100),
model VARCHAR(100),
prompt_tokens INTEGER,
completion_tokens INTEGER,
total_tokens INTEGER,
estimated_cost DECIMAL(10, 6),
created_at TIMESTAMP DEFAULT NOW()
);
Example Queries
Daily token usage:
SELECT
DATE(created_at) as date,
SUM(total_tokens) as tokens,
SUM(estimated_cost) as cost
FROM gateway_mode_audit
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY DATE(created_at)
ORDER BY date;
Usage by tenant:
SELECT
tenant_id,
SUM(total_tokens) as tokens,
SUM(estimated_cost) as cost
FROM gateway_mode_audit
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY tenant_id
ORDER BY cost DESC;
Most expensive models:
SELECT
provider,
model,
SUM(total_tokens) as tokens,
SUM(estimated_cost) as cost,
AVG(total_tokens) as avg_tokens_per_request
FROM gateway_mode_audit
WHERE created_at > NOW() - INTERVAL '7 days'
GROUP BY provider, model
ORDER BY cost DESC;
Grafana Dashboards
Community Edition
You can create your own Grafana dashboard using the Prometheus metrics above. Here's a sample panel configuration:
Token Usage Over Time:
{
"title": "Token Usage (24h)",
"type": "timeseries",
"targets": [
{
"expr": "sum(rate(axonflow_orchestrator_provider_total_tokens[5m])) * 60",
"legendFormat": "Tokens/min"
}
]
}
Cost by Provider (Pie Chart):
{
"title": "Cost by Provider",
"type": "piechart",
"targets": [
{
"expr": "sum by (provider) (increase(axonflow_orchestrator_provider_total_cost[7d]))",
"legendFormat": "{{provider}}"
}
]
}
Setting Up Prometheus Scraping
Add the Orchestrator to your Prometheus configuration:
scrape_configs:
- job_name: 'axonflow-orchestrator'
static_configs:
- targets: ['YOUR_ORCHESTRATOR_HOST:8081']
metrics_path: /metrics
scrape_interval: 15s
Alerting
Cost Threshold Alert
Alert when daily spend exceeds a threshold:
groups:
- name: axonflow-cost
rules:
- alert: HighLLMSpend
expr: sum(increase(axonflow_orchestrator_provider_total_cost[24h])) > 100
for: 1h
labels:
severity: warning
annotations:
summary: "LLM spend exceeds $100 in 24 hours"
description: "Current 24h spend: ${{ $value | printf \"%.2f\" }}"
Token Spike Alert
Alert on unusual token usage:
- alert: TokenUsageSpike
expr: sum(rate(axonflow_orchestrator_provider_total_tokens[5m])) > 1000
for: 10m
labels:
severity: warning
annotations:
summary: "Token usage spike detected"
description: "Token rate: {{ $value | printf \"%.0f\" }} tokens/second"
Best Practices
1. Set Up Cost Budgets
Establish monthly token/cost budgets and configure alerts at 50%, 75%, and 90% thresholds.
2. Monitor by Use Case
Add custom labels to requests to track cost by feature or use case:
- Customer support chatbot
- Document analysis
- Code generation
3. Optimize Prompt Length
Monitor prompt_tokens separately - prompt engineering can significantly reduce costs without impacting quality.
4. Review Model Selection
Regularly review cost-per-request by model. Sometimes a cheaper model (e.g., GPT-3.5 vs GPT-4) provides sufficient quality for specific use cases.
5. Enable Caching
For repeated similar queries, enable response caching to reduce redundant LLM calls.
Next Steps
- Monitoring Overview - Full monitoring setup guide
- LLM Provider Configuration - Configure your LLM providers
- Gateway Mode - How token tracking works in Gateway Mode
Enterprise deployments include pre-configured Executive Summary Dashboard with:
- LLM Cost Tracking (7d)
- Total Tokens Used
- Cost per 1K Tokens
- Average Cost per Request
- Cost by LLM Provider breakdown
Plus advanced features like tenant-level cost attribution, budget alerts, and export for finance teams. Contact sales for details.