Token Usage & Cost Tracking

AxonFlow automatically tracks token usage for all LLM requests processed through the platform. This guide covers how to access and analyze token metrics for cost management and optimization.

Overview

Every LLM request routed through AxonFlow captures:

Metric	Description
`prompt_tokens`	Number of tokens in the request (input)
`completion_tokens`	Number of tokens in the response (output)
`total_tokens`	Sum of prompt + completion tokens

This data is stored in the audit trail and exposed via Prometheus metrics for monitoring and alerting.

Prometheus Metrics

AxonFlow exposes token and cost metrics at the /metrics endpoint on the Orchestrator (port 8081).

Available Metrics

Metric	Type	Description
`axonflow_orchestrator_provider_total_tokens`	Counter	Total tokens used across all requests
`axonflow_orchestrator_provider_total_cost`	Counter	Estimated cost based on provider pricing
`axonflow_orchestrator_provider_prompt_tokens`	Counter	Total prompt (input) tokens
`axonflow_orchestrator_provider_completion_tokens`	Counter	Total completion (output) tokens

Labels

Metrics include labels for filtering:

provider - LLM provider (openai, anthropic, bedrock, ollama)
model - Specific model (gpt-4, claude-sonnet-4, etc.)
tenant_id - Tenant identifier (multi-tenant deployments)

Example Queries

Total tokens in the last 24 hours:

sum(increase(axonflow_orchestrator_provider_total_tokens[24h]))

Tokens by provider:

sum by (provider) (increase(axonflow_orchestrator_provider_total_tokens[24h]))

Cost by model (last 7 days):

sum by (model) (increase(axonflow_orchestrator_provider_total_cost[7d]))

Token rate (tokens per minute):

rate(axonflow_orchestrator_provider_total_tokens[5m]) * 60

Database Storage

Token usage is persisted in the gateway_mode_audit table for long-term analysis and compliance.

Schema

CREATE TABLE gateway_mode_audit (
    id UUID PRIMARY KEY,
    tenant_id VARCHAR(255) NOT NULL,
    user_id VARCHAR(255),
    provider VARCHAR(100),
    model VARCHAR(100),
    prompt_tokens INTEGER,
    completion_tokens INTEGER,
    total_tokens INTEGER,
    estimated_cost DECIMAL(10, 6),
    created_at TIMESTAMP DEFAULT NOW()
);

Example Queries

Daily token usage:

SELECT
    DATE(created_at) as date,
    SUM(total_tokens) as tokens,
    SUM(estimated_cost) as cost
FROM gateway_mode_audit
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY DATE(created_at)
ORDER BY date;

Usage by tenant:

SELECT
    tenant_id,
    SUM(total_tokens) as tokens,
    SUM(estimated_cost) as cost
FROM gateway_mode_audit
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY tenant_id
ORDER BY cost DESC;

Most expensive models:

SELECT
    provider,
    model,
    SUM(total_tokens) as tokens,
    SUM(estimated_cost) as cost,
    AVG(total_tokens) as avg_tokens_per_request
FROM gateway_mode_audit
WHERE created_at > NOW() - INTERVAL '7 days'
GROUP BY provider, model
ORDER BY cost DESC;

Grafana Dashboards

Community Edition

You can create your own Grafana dashboard using the Prometheus metrics above. Here's a sample panel configuration:

Token Usage Over Time:

{
  "title": "Token Usage (24h)",
  "type": "timeseries",
  "targets": [
    {
      "expr": "sum(rate(axonflow_orchestrator_provider_total_tokens[5m])) * 60",
      "legendFormat": "Tokens/min"
    }
  ]
}

Cost by Provider (Pie Chart):

{
  "title": "Cost by Provider",
  "type": "piechart",
  "targets": [
    {
      "expr": "sum by (provider) (increase(axonflow_orchestrator_provider_total_cost[7d]))",
      "legendFormat": "{{provider}}"
    }
  ]
}

Setting Up Prometheus Scraping

Add the Orchestrator to your Prometheus configuration:

scrape_configs:
  - job_name: 'axonflow-orchestrator'
    static_configs:
      - targets: ['YOUR_ORCHESTRATOR_HOST:8081']
    metrics_path: /metrics
    scrape_interval: 15s

Alerting

Cost Threshold Alert

Alert when daily spend exceeds a threshold:

groups:
  - name: axonflow-cost
    rules:
      - alert: HighLLMSpend
        expr: sum(increase(axonflow_orchestrator_provider_total_cost[24h])) > 100
        for: 1h
        labels:
          severity: warning
        annotations:
          summary: "LLM spend exceeds $100 in 24 hours"
          description: "Current 24h spend: ${{ $value | printf \"%.2f\" }}"

Token Spike Alert

Alert on unusual token usage:

      - alert: TokenUsageSpike
        expr: sum(rate(axonflow_orchestrator_provider_total_tokens[5m])) > 1000
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Token usage spike detected"
          description: "Token rate: {{ $value | printf \"%.0f\" }} tokens/second"

Cost Attribution

Cost attribution lets you break down LLM spend by tenant, user, or model to understand who is consuming resources and optimize accordingly.

By Tenant

Track costs per tenant for billing or chargeback:

sum by (tenant_id) (increase(axonflow_orchestrator_provider_total_cost[30d]))

-- Monthly cost report by tenant
SELECT
    tenant_id,
    SUM(estimated_cost) as total_cost,
    SUM(total_tokens) as total_tokens,
    COUNT(*) as request_count,
    ROUND(SUM(estimated_cost) / COUNT(*), 4) as avg_cost_per_request
FROM gateway_mode_audit
WHERE created_at >= DATE_TRUNC('month', CURRENT_DATE)
GROUP BY tenant_id
ORDER BY total_cost DESC;

By User

Identify top consumers within a tenant:

SELECT
    tenant_id,
    user_id,
    SUM(estimated_cost) as total_cost,
    SUM(total_tokens) as total_tokens
FROM gateway_mode_audit
WHERE created_at > NOW() - INTERVAL '7 days'
GROUP BY tenant_id, user_id
ORDER BY total_cost DESC
LIMIT 20;

By Model

Compare cost efficiency across models:

SELECT
    provider,
    model,
    COUNT(*) as requests,
    AVG(total_tokens) as avg_tokens,
    SUM(estimated_cost) as total_cost,
    ROUND(SUM(estimated_cost) / COUNT(*), 4) as cost_per_request
FROM gateway_mode_audit
WHERE created_at > NOW() - INTERVAL '7 days'
GROUP BY provider, model
ORDER BY cost_per_request DESC;

Example Dashboard Query (Grafana)

Create a Grafana variable for tenant and use this query for a cost-over-time panel:

{
  "title": "Cost by Tenant (7d)",
  "type": "timeseries",
  "targets": [
    {
      "expr": "sum by (tenant_id) (increase(axonflow_orchestrator_provider_total_cost[1h]))",
      "legendFormat": "{{tenant_id}}"
    }
  ]
}

Best Practices

1. Set Up Cost Budgets

Establish monthly token/cost budgets and configure alerts at 50%, 75%, and 90% thresholds.

2. Monitor by Use Case

Add custom labels to requests to track cost by feature or use case:

Customer support chatbot
Document analysis
Code generation

3. Optimize Prompt Length

Monitor prompt_tokens separately - prompt engineering can significantly reduce costs without impacting quality.

4. Review Model Selection

Regularly review cost-per-request by model. Sometimes a cheaper model (e.g., GPT-3.5 vs GPT-4) provides sufficient quality for specific use cases.

5. Enable Caching

For repeated similar queries, enable response caching to reduce redundant LLM calls.

Next Steps

Monitoring Overview - Full monitoring setup guide
LLM Provider Configuration - Configure your LLM providers
Gateway Mode - How token tracking works in Gateway Mode

Enterprise Cost Analytics

Enterprise deployments include pre-configured Executive Summary Dashboard with:

LLM Cost Tracking (7d)
Total Tokens Used
Cost per 1K Tokens
Average Cost per Request
Cost by LLM Provider breakdown

Plus advanced features like tenant-level cost attribution, budget alerts, and export for finance teams. Contact sales for details.

Overview​

Prometheus Metrics​

Available Metrics​

Labels​

Example Queries​

Database Storage​

Schema​

Example Queries​

Grafana Dashboards​

Community Edition​

Setting Up Prometheus Scraping​

Alerting​

Cost Threshold Alert​

Token Spike Alert​

Cost Attribution​

By Tenant​

By User​

By Model​

Example Dashboard Query (Grafana)​

Best Practices​

1. Set Up Cost Budgets​

2. Monitor by Use Case​

3. Optimize Prompt Length​

4. Review Model Selection​

5. Enable Caching​

Next Steps​

Overview

Prometheus Metrics

Available Metrics

Labels

Example Queries

Database Storage

Schema

Example Queries

Grafana Dashboards

Community Edition

Setting Up Prometheus Scraping

Alerting

Cost Threshold Alert

Token Spike Alert

Cost Attribution

By Tenant

By User

By Model

Example Dashboard Query (Grafana)

Best Practices

1. Set Up Cost Budgets

2. Monitor by Use Case

3. Optimize Prompt Length

4. Review Model Selection

5. Enable Caching

Next Steps