Cost Management

AxonFlow provides comprehensive cost management capabilities to help organizations control and optimize LLM spending. Set budgets at multiple levels, receive alerts before limits are reached, and enforce spending policies automatically.

Cost Controls Are Governance

In AxonFlow, budget limits are governance policies. Just like content policies block harmful requests, budget policies can block or downgrade requests that would exceed spending limits.

Why Cost Management Matters

AI agent deployments can quickly become expensive without proper controls:

Runaway Costs: A buggy agent loop can burn through API credits in minutes
Budget Allocation: Different teams/projects need separate spending limits
Visibility: Organizations need to know which agents/workflows cost the most
Accountability: Costs must be attributable to specific use cases
Forecasting: Monthly spend needs to be predictable

AxonFlow tracks every token, calculates costs in real-time, and enforces budget policies automatically.

Budget Hierarchy

Budgets can be set at multiple scopes, creating a hierarchy:

Organization Budget ($10,000/month)
    │
    ├── Team: Platform ($5,000/month)
    │       ├── Agent: code-reviewer ($1,000/month)
    │       └── Agent: test-generator ($500/month)
    │
    ├── Team: Data Science ($3,000/month)
    │       └── Workflow: daily-analysis ($100/day)
    │
    └── Team: Customer Support ($2,000/month)
            └── Agent: ticket-responder ($50/day)

Scope Types

Scope	Use Case
`organization`	Overall company spending limit
`team`	Department or project budget
`agent`	Individual AI agent budget
`workflow`	Specific workflow budget
`user`	Per-user spending limit

Creating Budgets

Using the SDK

from axonflow import AxonFlow, CreateBudgetRequest, BudgetScope, BudgetPeriod, BudgetOnExceed

async with AxonFlow(endpoint="http://localhost:8080") as client:
    # Organization-level monthly budget
    await client.create_budget(CreateBudgetRequest(
        id="org-monthly",
        name="Organization Monthly Budget",
        scope=BudgetScope.ORGANIZATION,
        limit_usd=10000.0,
        period=BudgetPeriod.MONTHLY,
        on_exceed=BudgetOnExceed.BLOCK,
        alert_thresholds=[50, 80, 90, 100]
    ))

    # Team budget (nested under org)
    await client.create_budget(CreateBudgetRequest(
        id="platform-team",
        name="Platform Team Budget",
        scope=BudgetScope.TEAM,
        scope_id="platform",
        limit_usd=5000.0,
        period=BudgetPeriod.MONTHLY,
        on_exceed=BudgetOnExceed.WARN,
        alert_thresholds=[50, 80, 100]
    ))

    # Agent budget (daily limit)
    await client.create_budget(CreateBudgetRequest(
        id="code-reviewer-daily",
        name="Code Reviewer Daily Limit",
        scope=BudgetScope.AGENT,
        scope_id="code-reviewer",
        limit_usd=50.0,
        period=BudgetPeriod.DAILY,
        on_exceed=BudgetOnExceed.BLOCK,
        alert_thresholds=[80, 100]
    ))

Budget Periods

Period	Reset Frequency
`daily`	Every day at midnight UTC
`weekly`	Every Monday at midnight UTC
`monthly`	First of each month at midnight UTC
`quarterly`	First of each quarter
`yearly`	January 1st

Enforcement Actions

When a budget threshold is reached, AxonFlow takes action based on the on_exceed setting:

Action	Behavior
`warn`	Log warning, send alert, allow request
`block`	Reject request with budget exceeded error
`downgrade`	Switch to a cheaper model (Enterprise)

Automatic Enforcement (Proxy Mode)

When using Proxy Mode (proxyLLMCall), budget enforcement is automatic. AxonFlow checks budgets before processing requests and returns HTTP 402 Payment Required when a budget with on_exceed=block is exceeded.

from axonflow import AxonFlow

async with AxonFlow(endpoint="http://localhost:8080") as client:
    response = await client.proxy_llm_call(
        user_token="[email protected]",
        query="Explain quantum computing",
        request_type="chat"
    )

    # Check if request was blocked due to budget
    if response.budget_info and response.budget_info.exceeded:
        print(f"Budget exceeded: {response.budget_info.budget_id}")
        print(f"Used: ${response.budget_info.used:.2f} / ${response.budget_info.limit:.2f}")
        print(f"Action: {response.budget_info.on_exceed}")
    else:
        print(f"Response: {response.data}")

BudgetInfo Response

When a budget is exceeded (or approaching limits with on_exceed=warn), the response includes budget_info:

{
  "success": false,
  "blocked": true,
  "budget_info": {
    "budget_id": "team-platform-monthly",
    "exceeded": true,
    "used": 5150.00,
    "limit": 5000.00,
    "currency": "USD",
    "period": "monthly",
    "on_exceed": "block",
    "percentage": 103.0
  }
}

Field	Description
`budget_id`	The budget that was exceeded
`exceeded`	Whether the limit has been exceeded
`used`	Current spending in this period
`limit`	Budget limit for this period
`currency`	Currency (default: USD)
`period`	Budget period (daily, weekly, monthly, etc.)
`on_exceed`	Configured action (block, warn, downgrade)
`percentage`	Percentage of budget used

HTTP Status Codes

Code	Meaning
`200 OK`	Request processed successfully
`402 Payment Required`	Budget exceeded with `on_exceed=block`
`200 OK` + `X-Budget-Warning` header	Budget exceeded with `on_exceed=warn`

Manual Pre-Check (Gateway Mode)

For Gateway Mode, you can manually check budgets before making LLM calls:

# Check budget before making LLM request
decision = await client.check_budget(BudgetCheckRequest(
    team_id="platform",
    agent_id="code-reviewer"
))

if decision.allowed:
    # Safe to make LLM request
    response = await llm.generate(prompt)
else:
    # Budget exceeded
    print(f"Blocked: {decision.message}")
    print(f"Used: ${decision.used_usd:.2f} / ${decision.limit_usd:.2f}")

Alert Thresholds

Configure percentage thresholds to receive alerts before budgets are exceeded:

await client.create_budget(CreateBudgetRequest(
    id="team-budget",
    name="Engineering Team",
    scope=BudgetScope.TEAM,
    scope_id="engineering",
    limit_usd=5000.0,
    period=BudgetPeriod.MONTHLY,
    on_exceed=BudgetOnExceed.BLOCK,
    alert_thresholds=[50, 80, 90, 100]  # Alert at 50%, 80%, 90%, 100%
))

Viewing Alerts

# Get alerts for a budget
alerts = await client.get_budget_alerts("team-budget")
for alert in alerts.alerts:
    print(f"[{alert.created_at}] {alert.message}")
    print(f"  Threshold: {alert.threshold}%")
    print(f"  Amount: ${alert.amount_usd:.2f}")

Monitoring Usage

Real-Time Budget Status

status = await client.get_budget_status("team-budget")
print(f"Budget: {status.budget.name}")
print(f"Used: ${status.used_usd:.2f} / ${status.budget.limit_usd:.2f}")
print(f"Remaining: ${status.remaining_usd:.2f}")
print(f"Percentage: {status.percentage:.1f}%")
print(f"Period: {status.period_start} to {status.period_end}")
if status.is_exceeded:
    print("WARNING: Budget exceeded!")

Usage Summary

usage = await client.get_usage_summary(period="monthly")
print(f"Total Cost: ${usage.total_cost_usd:.2f}")
print(f"Total Requests: {usage.total_requests:,}")
print(f"Tokens In: {usage.total_tokens_in:,}")
print(f"Tokens Out: {usage.total_tokens_out:,}")

Usage Breakdown

Analyze spending by different dimensions:

# By provider
by_provider = await client.get_usage_breakdown("provider", "monthly")
for item in by_provider.items:
    print(f"{item.name}: ${item.cost_usd:.2f} ({item.percentage:.1f}%)")

# By model
by_model = await client.get_usage_breakdown("model", "monthly")

# By team
by_team = await client.get_usage_breakdown("team", "monthly")

# By agent
by_agent = await client.get_usage_breakdown("agent", "monthly")

Best Practices

1. Start with Organization Budget

Always set a top-level organization budget as a safety net:

await client.create_budget(CreateBudgetRequest(
    id="org-safety-net",
    name="Organization Safety Net",
    scope=BudgetScope.ORGANIZATION,
    limit_usd=50000.0,  # High limit as safety net
    period=BudgetPeriod.MONTHLY,
    on_exceed=BudgetOnExceed.BLOCK,
    alert_thresholds=[80, 90, 95, 100]
))

2. Use Daily Limits for Agents

Agents can have runaway loops. Daily limits catch issues quickly:

await client.create_budget(CreateBudgetRequest(
    id="agent-daily",
    scope=BudgetScope.AGENT,
    scope_id="my-agent",
    limit_usd=100.0,  # $100/day max
    period=BudgetPeriod.DAILY,
    on_exceed=BudgetOnExceed.BLOCK
))

3. Pre-Check Before Expensive Operations

Always check budgets before calling expensive models:

async def safe_llm_call(prompt: str, team_id: str):
    # Check budget first
    decision = await client.check_budget(BudgetCheckRequest(team_id=team_id))

    if not decision.allowed:
        raise BudgetExceededError(decision.message)

    # Safe to proceed
    return await llm.generate(prompt)

4. Set Meaningful Alert Thresholds

Configure alerts that give you time to react:

50%: Early warning, halfway through budget
80%: Time to review spending patterns
90%: Consider reducing usage or increasing budget
100%: Budget exhausted

Budget Alerts

Budget alerts notify stakeholders when spending approaches configured thresholds. Alerts are triggered once per threshold per budget period.

# Configure alert channels (Enterprise)
await client.configure_budget_alerts(BudgetAlertConfig(
    budget_id="team-budget",
    channels=[
        AlertChannel(type="email", target="[email protected]"),
        AlertChannel(type="slack", target="#platform-cost-alerts"),
        AlertChannel(type="webhook", target="https://internal.example.com/budget-hook"),
    ]
))

In Community Edition, alerts are written to the application log. Enterprise Edition supports email, Slack, and webhook delivery channels.

Usage Forecasting

Enterprise Edition provides usage forecasting based on historical spending patterns. Forecasts project estimated end-of-period spend using a rolling average of the most recent 7 days:

forecast = await client.get_usage_forecast("team-budget")
print(f"Current spend: ${forecast.current_usd:.2f}")
print(f"Projected end-of-month: ${forecast.projected_usd:.2f}")
print(f"Budget limit: ${forecast.limit_usd:.2f}")
print(f"Risk level: {forecast.risk}")  # low, medium, high, critical

Forecasts help teams adjust usage before budgets are exceeded rather than reacting after the fact.

Auto-Downgrade Behavior

When on_exceed is set to downgrade, AxonFlow automatically routes requests to a cheaper model once the budget threshold is reached. This avoids hard blocking while controlling costs:

Original Model	Downgrade Target	Approximate Savings
`gpt-4`	`gpt-4o-mini`	~95%
`claude-opus-4`	`claude-haiku-4.5`	~95%
`gpt-4o`	`gpt-4o-mini`	~90%

The downgrade is transparent to the calling application. The budget_info in the response indicates when a downgrade occurred:

{
  "budget_info": {
    "budget_id": "team-platform-monthly",
    "exceeded": true,
    "on_exceed": "downgrade",
    "downgraded_from": "gpt-4",
    "downgraded_to": "gpt-4o-mini"
  }
}

Auto-downgrade is an Enterprise feature. Community Edition supports warn and block actions only.

Cost Estimation

Added in v4.3.0.

AxonFlow can estimate the cost of a workflow plan before execution, helping teams decide whether to proceed, adjust parameters, or choose cheaper models. Two endpoints support this:

Estimate Cost Before Execution

Submit a plan definition to get a cost estimate without creating or running anything.

POST /api/v1/plans/estimate

Request Body:

{
  "provider": "openai",
  "model": "gpt-4",
  "steps": [
    {
      "name": "analyze",
      "type": "llm_call",
      "estimated_tokens_in": 1000,
      "estimated_tokens_out": 500
    }
  ]
}

Response:

{
  "estimated_cost_usd": 0.045,
  "currency": "USD",
  "breakdown": [
    {
      "step": "analyze",
      "type": "llm_call",
      "tokens_in": 1000,
      "tokens_out": 500,
      "cost_usd": 0.045
    }
  ]
}

Get Cost Estimate for an Existing Plan

Retrieve a cost estimate for a plan that has already been created.

GET /api/v1/plans/{id}/cost

Response:

{
  "plan_id": "plan_abc123",
  "estimated_cost_usd": 0.045,
  "currency": "USD",
  "breakdown": [
    {
      "step": "analyze",
      "type": "llm_call",
      "tokens_in": 1000,
      "tokens_out": 500,
      "cost_usd": 0.045
    }
  ]
}

Tiered Access

Cost estimation availability varies by edition:

Feature	Community	Evaluation	Enterprise
Estimate response	Aggregate total only	Full per-step breakdown	Full per-step breakdown
Estimates per day	10	100	Unlimited

SDK Example (Python)

estimate = await client.get_plan_cost(plan_id)
print(f"Estimated cost: ${estimate.estimated_cost_usd:.3f}")
for step in estimate.breakdown:
    print(f"  {step.step}: ${step.cost_usd:.3f}")

Community vs Enterprise

Feature	Community	Enterprise
Usage tracking	✅	✅
Budget limits	✅	✅
Budget hierarchy	✅	✅
Alert thresholds	✅	✅
Pre-flight budget check	✅	✅
Usage breakdown	✅	✅
Usage forecast	❌	✅
Usage export	❌	✅
Alert channels (Slack, email, webhook)	❌	✅
Auto-downgrade to cheaper models	❌	✅
Budget rollover	❌	✅
Cost dashboard	❌	✅

Next Steps

Cost Controls API Reference - Complete API documentation
Audit Logging - Track all AI interactions
Policy-as-Code - Define governance rules

Why Cost Management Matters​

Budget Hierarchy​

Scope Types​

Creating Budgets​

Using the SDK​

Budget Periods​

Enforcement Actions​

Automatic Enforcement (Proxy Mode)​

BudgetInfo Response​

HTTP Status Codes​

Manual Pre-Check (Gateway Mode)​

Alert Thresholds​

Viewing Alerts​

Monitoring Usage​

Real-Time Budget Status​

Usage Summary​

Usage Breakdown​

Best Practices​

1. Start with Organization Budget​

2. Use Daily Limits for Agents​

3. Pre-Check Before Expensive Operations​

4. Set Meaningful Alert Thresholds​

Budget Alerts​

Usage Forecasting​

Auto-Downgrade Behavior​

Cost Estimation​

Estimate Cost Before Execution​

Get Cost Estimate for an Existing Plan​

Tiered Access​

SDK Example (Python)​

Community vs Enterprise​

Next Steps​