Skip to main content

Token Usage & Cost Tracking

AxonFlow automatically tracks token usage for all LLM requests processed through the platform. This guide covers how to access and analyze token metrics for cost management and optimization.

Overview

Every LLM request routed through AxonFlow captures:

MetricDescription
prompt_tokensNumber of tokens in the request (input)
completion_tokensNumber of tokens in the response (output)
total_tokensSum of prompt + completion tokens

This data is stored in the audit trail and exposed via Prometheus metrics for monitoring and alerting.


Prometheus Metrics

AxonFlow exposes token and cost metrics at the /metrics endpoint on the Orchestrator (port 8081).

Available Metrics

MetricTypeDescription
axonflow_orchestrator_provider_total_tokensCounterTotal tokens used across all requests
axonflow_orchestrator_provider_total_costCounterEstimated cost based on provider pricing
axonflow_orchestrator_provider_prompt_tokensCounterTotal prompt (input) tokens
axonflow_orchestrator_provider_completion_tokensCounterTotal completion (output) tokens

Labels

Metrics include labels for filtering:

  • provider - LLM provider (openai, anthropic, bedrock, ollama)
  • model - Specific model (gpt-4, claude-3-sonnet, etc.)
  • tenant_id - Tenant identifier (multi-tenant deployments)

Example Queries

Total tokens in the last 24 hours:

sum(increase(axonflow_orchestrator_provider_total_tokens[24h]))

Tokens by provider:

sum by (provider) (increase(axonflow_orchestrator_provider_total_tokens[24h]))

Cost by model (last 7 days):

sum by (model) (increase(axonflow_orchestrator_provider_total_cost[7d]))

Token rate (tokens per minute):

rate(axonflow_orchestrator_provider_total_tokens[5m]) * 60

Database Storage

Token usage is persisted in the gateway_mode_audit table for long-term analysis and compliance.

Schema

CREATE TABLE gateway_mode_audit (
id UUID PRIMARY KEY,
tenant_id VARCHAR(255) NOT NULL,
user_id VARCHAR(255),
provider VARCHAR(100),
model VARCHAR(100),
prompt_tokens INTEGER,
completion_tokens INTEGER,
total_tokens INTEGER,
estimated_cost DECIMAL(10, 6),
created_at TIMESTAMP DEFAULT NOW()
);

Example Queries

Daily token usage:

SELECT
DATE(created_at) as date,
SUM(total_tokens) as tokens,
SUM(estimated_cost) as cost
FROM gateway_mode_audit
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY DATE(created_at)
ORDER BY date;

Usage by tenant:

SELECT
tenant_id,
SUM(total_tokens) as tokens,
SUM(estimated_cost) as cost
FROM gateway_mode_audit
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY tenant_id
ORDER BY cost DESC;

Most expensive models:

SELECT
provider,
model,
SUM(total_tokens) as tokens,
SUM(estimated_cost) as cost,
AVG(total_tokens) as avg_tokens_per_request
FROM gateway_mode_audit
WHERE created_at > NOW() - INTERVAL '7 days'
GROUP BY provider, model
ORDER BY cost DESC;

Grafana Dashboards

Community Edition

You can create your own Grafana dashboard using the Prometheus metrics above. Here's a sample panel configuration:

Token Usage Over Time:

{
"title": "Token Usage (24h)",
"type": "timeseries",
"targets": [
{
"expr": "sum(rate(axonflow_orchestrator_provider_total_tokens[5m])) * 60",
"legendFormat": "Tokens/min"
}
]
}

Cost by Provider (Pie Chart):

{
"title": "Cost by Provider",
"type": "piechart",
"targets": [
{
"expr": "sum by (provider) (increase(axonflow_orchestrator_provider_total_cost[7d]))",
"legendFormat": "{{provider}}"
}
]
}

Setting Up Prometheus Scraping

Add the Orchestrator to your Prometheus configuration:

scrape_configs:
- job_name: 'axonflow-orchestrator'
static_configs:
- targets: ['YOUR_ORCHESTRATOR_HOST:8081']
metrics_path: /metrics
scrape_interval: 15s

Alerting

Cost Threshold Alert

Alert when daily spend exceeds a threshold:

groups:
- name: axonflow-cost
rules:
- alert: HighLLMSpend
expr: sum(increase(axonflow_orchestrator_provider_total_cost[24h])) > 100
for: 1h
labels:
severity: warning
annotations:
summary: "LLM spend exceeds $100 in 24 hours"
description: "Current 24h spend: ${{ $value | printf \"%.2f\" }}"

Token Spike Alert

Alert on unusual token usage:

      - alert: TokenUsageSpike
expr: sum(rate(axonflow_orchestrator_provider_total_tokens[5m])) > 1000
for: 10m
labels:
severity: warning
annotations:
summary: "Token usage spike detected"
description: "Token rate: {{ $value | printf \"%.0f\" }} tokens/second"

Best Practices

1. Set Up Cost Budgets

Establish monthly token/cost budgets and configure alerts at 50%, 75%, and 90% thresholds.

2. Monitor by Use Case

Add custom labels to requests to track cost by feature or use case:

  • Customer support chatbot
  • Document analysis
  • Code generation

3. Optimize Prompt Length

Monitor prompt_tokens separately - prompt engineering can significantly reduce costs without impacting quality.

4. Review Model Selection

Regularly review cost-per-request by model. Sometimes a cheaper model (e.g., GPT-3.5 vs GPT-4) provides sufficient quality for specific use cases.

5. Enable Caching

For repeated similar queries, enable response caching to reduce redundant LLM calls.


Next Steps


Enterprise Cost Analytics

Enterprise deployments include pre-configured Executive Summary Dashboard with:

  • LLM Cost Tracking (7d)
  • Total Tokens Used
  • Cost per 1K Tokens
  • Average Cost per Request
  • Cost by LLM Provider breakdown

Plus advanced features like tenant-level cost attribution, budget alerts, and export for finance teams. Contact sales for details.