Skip to main content

Cost Management

AxonFlow provides comprehensive cost management capabilities to help organizations control and optimize LLM spending. Set budgets at multiple levels, receive alerts before limits are reached, and enforce spending policies automatically.

Cost Controls Are Governance

In AxonFlow, budget limits are governance policies. Just like content policies block harmful requests, budget policies can block or downgrade requests that would exceed spending limits.


Why Cost Management Matters

AI agent deployments can quickly become expensive without proper controls:

  • Runaway Costs: A buggy agent loop can burn through API credits in minutes
  • Budget Allocation: Different teams/projects need separate spending limits
  • Visibility: Organizations need to know which agents/workflows cost the most
  • Accountability: Costs must be attributable to specific use cases
  • Forecasting: Monthly spend needs to be predictable

AxonFlow tracks every token, calculates costs in real-time, and enforces budget policies automatically.


Budget Hierarchy

Budgets can be set at multiple scopes, creating a hierarchy:

Organization Budget ($10,000/month)

├── Team: Platform ($5,000/month)
│ ├── Agent: code-reviewer ($1,000/month)
│ └── Agent: test-generator ($500/month)

├── Team: Data Science ($3,000/month)
│ └── Workflow: daily-analysis ($100/day)

└── Team: Customer Support ($2,000/month)
└── Agent: ticket-responder ($50/day)

Scope Types

ScopeUse Case
organizationOverall company spending limit
teamDepartment or project budget
agentIndividual AI agent budget
workflowSpecific workflow budget
userPer-user spending limit

Creating Budgets

Using the SDK

from axonflow import AxonFlow, CreateBudgetRequest, BudgetScope, BudgetPeriod, BudgetOnExceed

async with AxonFlow(endpoint="http://localhost:8080") as client:
# Organization-level monthly budget
await client.create_budget(CreateBudgetRequest(
id="org-monthly",
name="Organization Monthly Budget",
scope=BudgetScope.ORGANIZATION,
limit_usd=10000.0,
period=BudgetPeriod.MONTHLY,
on_exceed=BudgetOnExceed.BLOCK,
alert_thresholds=[50, 80, 90, 100]
))

# Team budget (nested under org)
await client.create_budget(CreateBudgetRequest(
id="platform-team",
name="Platform Team Budget",
scope=BudgetScope.TEAM,
scope_id="platform",
limit_usd=5000.0,
period=BudgetPeriod.MONTHLY,
on_exceed=BudgetOnExceed.WARN,
alert_thresholds=[50, 80, 100]
))

# Agent budget (daily limit)
await client.create_budget(CreateBudgetRequest(
id="code-reviewer-daily",
name="Code Reviewer Daily Limit",
scope=BudgetScope.AGENT,
scope_id="code-reviewer",
limit_usd=50.0,
period=BudgetPeriod.DAILY,
on_exceed=BudgetOnExceed.BLOCK,
alert_thresholds=[80, 100]
))

Budget Periods

PeriodReset Frequency
dailyEvery day at midnight UTC
weeklyEvery Monday at midnight UTC
monthlyFirst of each month at midnight UTC
quarterlyFirst of each quarter
yearlyJanuary 1st

Enforcement Actions

When a budget threshold is reached, AxonFlow takes action based on the on_exceed setting:

ActionBehavior
warnLog warning, send alert, allow request
blockReject request with budget exceeded error
downgradeSwitch to a cheaper model (Enterprise)

Automatic Enforcement (Proxy Mode)

When using Proxy Mode (proxyLLMCall), budget enforcement is automatic. AxonFlow checks budgets before processing requests and returns HTTP 402 Payment Required when a budget with on_exceed=block is exceeded.

from axonflow import AxonFlow

async with AxonFlow(endpoint="http://localhost:8080") as client:
response = await client.proxy_llm_call(
user_token="[email protected]",
query="Explain quantum computing",
request_type="chat"
)

# Check if request was blocked due to budget
if response.budget_info and response.budget_info.exceeded:
print(f"Budget exceeded: {response.budget_info.budget_id}")
print(f"Used: ${response.budget_info.used:.2f} / ${response.budget_info.limit:.2f}")
print(f"Action: {response.budget_info.on_exceed}")
else:
print(f"Response: {response.data}")

BudgetInfo Response

When a budget is exceeded (or approaching limits with on_exceed=warn), the response includes budget_info:

{
"success": false,
"blocked": true,
"budget_info": {
"budget_id": "team-platform-monthly",
"exceeded": true,
"used": 5150.00,
"limit": 5000.00,
"currency": "USD",
"period": "monthly",
"on_exceed": "block",
"percentage": 103.0
}
}
FieldDescription
budget_idThe budget that was exceeded
exceededWhether the limit has been exceeded
usedCurrent spending in this period
limitBudget limit for this period
currencyCurrency (default: USD)
periodBudget period (daily, weekly, monthly, etc.)
on_exceedConfigured action (block, warn, downgrade)
percentagePercentage of budget used

HTTP Status Codes

CodeMeaning
200 OKRequest processed successfully
402 Payment RequiredBudget exceeded with on_exceed=block
200 OK + X-Budget-Warning headerBudget exceeded with on_exceed=warn

Manual Pre-Check (Gateway Mode)

For Gateway Mode, you can manually check budgets before making LLM calls:

# Check budget before making LLM request
decision = await client.check_budget(BudgetCheckRequest(
team_id="platform",
agent_id="code-reviewer"
))

if decision.allowed:
# Safe to make LLM request
response = await llm.generate(prompt)
else:
# Budget exceeded
print(f"Blocked: {decision.message}")
print(f"Used: ${decision.used_usd:.2f} / ${decision.limit_usd:.2f}")

Alert Thresholds

Configure percentage thresholds to receive alerts before budgets are exceeded:

await client.create_budget(CreateBudgetRequest(
id="team-budget",
name="Engineering Team",
scope=BudgetScope.TEAM,
scope_id="engineering",
limit_usd=5000.0,
period=BudgetPeriod.MONTHLY,
on_exceed=BudgetOnExceed.BLOCK,
alert_thresholds=[50, 80, 90, 100] # Alert at 50%, 80%, 90%, 100%
))

Viewing Alerts

# Get alerts for a budget
alerts = await client.get_budget_alerts("team-budget")
for alert in alerts.alerts:
print(f"[{alert.created_at}] {alert.message}")
print(f" Threshold: {alert.threshold}%")
print(f" Amount: ${alert.amount_usd:.2f}")

Monitoring Usage

Real-Time Budget Status

status = await client.get_budget_status("team-budget")
print(f"Budget: {status.budget.name}")
print(f"Used: ${status.used_usd:.2f} / ${status.budget.limit_usd:.2f}")
print(f"Remaining: ${status.remaining_usd:.2f}")
print(f"Percentage: {status.percentage:.1f}%")
print(f"Period: {status.period_start} to {status.period_end}")
if status.is_exceeded:
print("WARNING: Budget exceeded!")

Usage Summary

usage = await client.get_usage_summary(period="monthly")
print(f"Total Cost: ${usage.total_cost_usd:.2f}")
print(f"Total Requests: {usage.total_requests:,}")
print(f"Tokens In: {usage.total_tokens_in:,}")
print(f"Tokens Out: {usage.total_tokens_out:,}")

Usage Breakdown

Analyze spending by different dimensions:

# By provider
by_provider = await client.get_usage_breakdown("provider", "monthly")
for item in by_provider.items:
print(f"{item.name}: ${item.cost_usd:.2f} ({item.percentage:.1f}%)")

# By model
by_model = await client.get_usage_breakdown("model", "monthly")

# By team
by_team = await client.get_usage_breakdown("team", "monthly")

# By agent
by_agent = await client.get_usage_breakdown("agent", "monthly")

Best Practices

1. Start with Organization Budget

Always set a top-level organization budget as a safety net:

await client.create_budget(CreateBudgetRequest(
id="org-safety-net",
name="Organization Safety Net",
scope=BudgetScope.ORGANIZATION,
limit_usd=50000.0, # High limit as safety net
period=BudgetPeriod.MONTHLY,
on_exceed=BudgetOnExceed.BLOCK,
alert_thresholds=[80, 90, 95, 100]
))

2. Use Daily Limits for Agents

Agents can have runaway loops. Daily limits catch issues quickly:

await client.create_budget(CreateBudgetRequest(
id="agent-daily",
scope=BudgetScope.AGENT,
scope_id="my-agent",
limit_usd=100.0, # $100/day max
period=BudgetPeriod.DAILY,
on_exceed=BudgetOnExceed.BLOCK
))

3. Pre-Check Before Expensive Operations

Always check budgets before calling expensive models:

async def safe_llm_call(prompt: str, team_id: str):
# Check budget first
decision = await client.check_budget(BudgetCheckRequest(team_id=team_id))

if not decision.allowed:
raise BudgetExceededError(decision.message)

# Safe to proceed
return await llm.generate(prompt)

4. Set Meaningful Alert Thresholds

Configure alerts that give you time to react:

  • 50%: Early warning, halfway through budget
  • 80%: Time to review spending patterns
  • 90%: Consider reducing usage or increasing budget
  • 100%: Budget exhausted

Budget Alerts

Budget alerts notify stakeholders when spending approaches configured thresholds. Alerts are triggered once per threshold per budget period.

# Configure alert channels (Enterprise)
await client.configure_budget_alerts(BudgetAlertConfig(
budget_id="team-budget",
channels=[
AlertChannel(type="email", target="[email protected]"),
AlertChannel(type="slack", target="#platform-cost-alerts"),
AlertChannel(type="webhook", target="https://internal.example.com/budget-hook"),
]
))

In Community Edition, alerts are written to the application log. Enterprise Edition supports email, Slack, and webhook delivery channels.


Usage Forecasting

Enterprise Edition provides usage forecasting based on historical spending patterns. Forecasts project estimated end-of-period spend using a rolling average of the most recent 7 days:

forecast = await client.get_usage_forecast("team-budget")
print(f"Current spend: ${forecast.current_usd:.2f}")
print(f"Projected end-of-month: ${forecast.projected_usd:.2f}")
print(f"Budget limit: ${forecast.limit_usd:.2f}")
print(f"Risk level: {forecast.risk}") # low, medium, high, critical

Forecasts help teams adjust usage before budgets are exceeded rather than reacting after the fact.


Auto-Downgrade Behavior

When on_exceed is set to downgrade, AxonFlow automatically routes requests to a cheaper model once the budget threshold is reached. This avoids hard blocking while controlling costs:

Original ModelDowngrade TargetApproximate Savings
gpt-4gpt-4o-mini~95%
claude-opus-4claude-haiku-4.5~95%
gpt-4ogpt-4o-mini~90%

The downgrade is transparent to the calling application. The budget_info in the response indicates when a downgrade occurred:

{
"budget_info": {
"budget_id": "team-platform-monthly",
"exceeded": true,
"on_exceed": "downgrade",
"downgraded_from": "gpt-4",
"downgraded_to": "gpt-4o-mini"
}
}

Auto-downgrade is an Enterprise feature. Community Edition supports warn and block actions only.


Cost Estimation

Added in v4.3.0.

AxonFlow can estimate the cost of a workflow plan before execution, helping teams decide whether to proceed, adjust parameters, or choose cheaper models. Two endpoints support this:

Estimate Cost Before Execution

Submit a plan definition to get a cost estimate without creating or running anything.

POST /api/v1/plans/estimate

Request Body:

{
"provider": "openai",
"model": "gpt-4",
"steps": [
{
"name": "analyze",
"type": "llm_call",
"estimated_tokens_in": 1000,
"estimated_tokens_out": 500
}
]
}

Response:

{
"estimated_cost_usd": 0.045,
"currency": "USD",
"breakdown": [
{
"step": "analyze",
"type": "llm_call",
"tokens_in": 1000,
"tokens_out": 500,
"cost_usd": 0.045
}
]
}

Get Cost Estimate for an Existing Plan

Retrieve a cost estimate for a plan that has already been created.

GET /api/v1/plans/{id}/cost

Response:

{
"plan_id": "plan_abc123",
"estimated_cost_usd": 0.045,
"currency": "USD",
"breakdown": [
{
"step": "analyze",
"type": "llm_call",
"tokens_in": 1000,
"tokens_out": 500,
"cost_usd": 0.045
}
]
}

Tiered Access

Cost estimation availability varies by edition:

FeatureCommunityEvaluationEnterprise
Estimate responseAggregate total onlyFull per-step breakdownFull per-step breakdown
Estimates per day10100Unlimited

SDK Example (Python)

estimate = await client.get_plan_cost(plan_id)
print(f"Estimated cost: ${estimate.estimated_cost_usd:.3f}")
for step in estimate.breakdown:
print(f" {step.step}: ${step.cost_usd:.3f}")

Community vs Enterprise

FeatureCommunityEnterprise
Usage tracking
Budget limits
Budget hierarchy
Alert thresholds
Pre-flight budget check
Usage breakdown
Usage forecast
Usage export
Alert channels (Slack, email, webhook)
Auto-downgrade to cheaper models
Budget rollover
Cost dashboard

Next Steps