Cost Management
AxonFlow provides comprehensive cost management capabilities to help organizations control and optimize LLM spending. Set budgets at multiple levels, receive alerts before limits are reached, and enforce spending policies automatically.
In AxonFlow, budget limits are governance policies. Just like content policies block harmful requests, budget policies can block or downgrade requests that would exceed spending limits.
Why Cost Management Matters
AI agent deployments can quickly become expensive without proper controls:
- Runaway Costs: A buggy agent loop can burn through API credits in minutes
- Budget Allocation: Different teams/projects need separate spending limits
- Visibility: Organizations need to know which agents/workflows cost the most
- Accountability: Costs must be attributable to specific use cases
- Forecasting: Monthly spend needs to be predictable
AxonFlow tracks every token, calculates costs in real-time, and enforces budget policies automatically.
Budget Hierarchy
Budgets can be set at multiple scopes, creating a hierarchy:
Organization Budget ($10,000/month)
│
├── Team: Platform ($5,000/month)
│ ├── Agent: code-reviewer ($1,000/month)
│ └── Agent: test-generator ($500/month)
│
├── Team: Data Science ($3,000/month)
│ └── Workflow: daily-analysis ($100/day)
│
└── Team: Customer Support ($2,000/month)
└── Agent: ticket-responder ($50/day)
Scope Types
| Scope | Use Case |
|---|---|
organization | Overall company spending limit |
team | Department or project budget |
agent | Individual AI agent budget |
workflow | Specific workflow budget |
user | Per-user spending limit |
Creating Budgets
Using the SDK
from axonflow import AxonFlow, CreateBudgetRequest, BudgetScope, BudgetPeriod, BudgetOnExceed
async with AxonFlow(endpoint="http://localhost:8080") as client:
# Organization-level monthly budget
await client.create_budget(CreateBudgetRequest(
id="org-monthly",
name="Organization Monthly Budget",
scope=BudgetScope.ORGANIZATION,
limit_usd=10000.0,
period=BudgetPeriod.MONTHLY,
on_exceed=BudgetOnExceed.BLOCK,
alert_thresholds=[50, 80, 90, 100]
))
# Team budget (nested under org)
await client.create_budget(CreateBudgetRequest(
id="platform-team",
name="Platform Team Budget",
scope=BudgetScope.TEAM,
scope_id="platform",
limit_usd=5000.0,
period=BudgetPeriod.MONTHLY,
on_exceed=BudgetOnExceed.WARN,
alert_thresholds=[50, 80, 100]
))
# Agent budget (daily limit)
await client.create_budget(CreateBudgetRequest(
id="code-reviewer-daily",
name="Code Reviewer Daily Limit",
scope=BudgetScope.AGENT,
scope_id="code-reviewer",
limit_usd=50.0,
period=BudgetPeriod.DAILY,
on_exceed=BudgetOnExceed.BLOCK,
alert_thresholds=[80, 100]
))
Budget Periods
| Period | Reset Frequency |
|---|---|
daily | Every day at midnight UTC |
weekly | Every Monday at midnight UTC |
monthly | First of each month at midnight UTC |
quarterly | First of each quarter |
yearly | January 1st |
Enforcement Actions
When a budget threshold is reached, AxonFlow takes action based on the on_exceed setting:
| Action | Behavior |
|---|---|
warn | Log warning, send alert, allow request |
block | Reject request with budget exceeded error |
downgrade | Switch to a cheaper model (Enterprise) |
Automatic Enforcement (Proxy Mode)
When using Proxy Mode (proxyLLMCall), budget enforcement is automatic. AxonFlow checks budgets before processing requests and returns HTTP 402 Payment Required when a budget with on_exceed=block is exceeded.
from axonflow import AxonFlow
async with AxonFlow(endpoint="http://localhost:8080") as client:
response = await client.proxy_llm_call(
user_token="[email protected]",
query="Explain quantum computing",
request_type="chat"
)
# Check if request was blocked due to budget
if response.budget_info and response.budget_info.exceeded:
print(f"Budget exceeded: {response.budget_info.budget_id}")
print(f"Used: ${response.budget_info.used:.2f} / ${response.budget_info.limit:.2f}")
print(f"Action: {response.budget_info.on_exceed}")
else:
print(f"Response: {response.data}")
BudgetInfo Response
When a budget is exceeded (or approaching limits with on_exceed=warn), the response includes budget_info:
{
"success": false,
"blocked": true,
"budget_info": {
"budget_id": "team-platform-monthly",
"exceeded": true,
"used": 5150.00,
"limit": 5000.00,
"currency": "USD",
"period": "monthly",
"on_exceed": "block",
"percentage": 103.0
}
}
| Field | Description |
|---|---|
budget_id | The budget that was exceeded |
exceeded | Whether the limit has been exceeded |
used | Current spending in this period |
limit | Budget limit for this period |
currency | Currency (default: USD) |
period | Budget period (daily, weekly, monthly, etc.) |
on_exceed | Configured action (block, warn, downgrade) |
percentage | Percentage of budget used |
HTTP Status Codes
| Code | Meaning |
|---|---|
200 OK | Request processed successfully |
402 Payment Required | Budget exceeded with on_exceed=block |
200 OK + X-Budget-Warning header | Budget exceeded with on_exceed=warn |
Manual Pre-Check (Gateway Mode)
For Gateway Mode, you can manually check budgets before making LLM calls:
# Check budget before making LLM request
decision = await client.check_budget(BudgetCheckRequest(
team_id="platform",
agent_id="code-reviewer"
))
if decision.allowed:
# Safe to make LLM request
response = await llm.generate(prompt)
else:
# Budget exceeded
print(f"Blocked: {decision.message}")
print(f"Used: ${decision.used_usd:.2f} / ${decision.limit_usd:.2f}")
Alert Thresholds
Configure percentage thresholds to receive alerts before budgets are exceeded:
await client.create_budget(CreateBudgetRequest(
id="team-budget",
name="Engineering Team",
scope=BudgetScope.TEAM,
scope_id="engineering",
limit_usd=5000.0,
period=BudgetPeriod.MONTHLY,
on_exceed=BudgetOnExceed.BLOCK,
alert_thresholds=[50, 80, 90, 100] # Alert at 50%, 80%, 90%, 100%
))
Viewing Alerts
# Get alerts for a budget
alerts = await client.get_budget_alerts("team-budget")
for alert in alerts.alerts:
print(f"[{alert.created_at}] {alert.message}")
print(f" Threshold: {alert.threshold}%")
print(f" Amount: ${alert.amount_usd:.2f}")
Monitoring Usage
Real-Time Budget Status
status = await client.get_budget_status("team-budget")
print(f"Budget: {status.budget.name}")
print(f"Used: ${status.used_usd:.2f} / ${status.budget.limit_usd:.2f}")
print(f"Remaining: ${status.remaining_usd:.2f}")
print(f"Percentage: {status.percentage:.1f}%")
print(f"Period: {status.period_start} to {status.period_end}")
if status.is_exceeded:
print("WARNING: Budget exceeded!")
Usage Summary
usage = await client.get_usage_summary(period="monthly")
print(f"Total Cost: ${usage.total_cost_usd:.2f}")
print(f"Total Requests: {usage.total_requests:,}")
print(f"Tokens In: {usage.total_tokens_in:,}")
print(f"Tokens Out: {usage.total_tokens_out:,}")
Usage Breakdown
Analyze spending by different dimensions:
# By provider
by_provider = await client.get_usage_breakdown("provider", "monthly")
for item in by_provider.items:
print(f"{item.name}: ${item.cost_usd:.2f} ({item.percentage:.1f}%)")
# By model
by_model = await client.get_usage_breakdown("model", "monthly")
# By team
by_team = await client.get_usage_breakdown("team", "monthly")
# By agent
by_agent = await client.get_usage_breakdown("agent", "monthly")
Best Practices
1. Start with Organization Budget
Always set a top-level organization budget as a safety net:
await client.create_budget(CreateBudgetRequest(
id="org-safety-net",
name="Organization Safety Net",
scope=BudgetScope.ORGANIZATION,
limit_usd=50000.0, # High limit as safety net
period=BudgetPeriod.MONTHLY,
on_exceed=BudgetOnExceed.BLOCK,
alert_thresholds=[80, 90, 95, 100]
))
2. Use Daily Limits for Agents
Agents can have runaway loops. Daily limits catch issues quickly:
await client.create_budget(CreateBudgetRequest(
id="agent-daily",
scope=BudgetScope.AGENT,
scope_id="my-agent",
limit_usd=100.0, # $100/day max
period=BudgetPeriod.DAILY,
on_exceed=BudgetOnExceed.BLOCK
))
3. Pre-Check Before Expensive Operations
Always check budgets before calling expensive models:
async def safe_llm_call(prompt: str, team_id: str):
# Check budget first
decision = await client.check_budget(BudgetCheckRequest(team_id=team_id))
if not decision.allowed:
raise BudgetExceededError(decision.message)
# Safe to proceed
return await llm.generate(prompt)
4. Set Meaningful Alert Thresholds
Configure alerts that give you time to react:
- 50%: Early warning, halfway through budget
- 80%: Time to review spending patterns
- 90%: Consider reducing usage or increasing budget
- 100%: Budget exhausted
Budget Alerts
Budget alerts notify stakeholders when spending approaches configured thresholds. Alerts are triggered once per threshold per budget period.
# Configure alert channels (Enterprise)
await client.configure_budget_alerts(BudgetAlertConfig(
budget_id="team-budget",
channels=[
AlertChannel(type="email", target="[email protected]"),
AlertChannel(type="slack", target="#platform-cost-alerts"),
AlertChannel(type="webhook", target="https://internal.example.com/budget-hook"),
]
))
In Community Edition, alerts are written to the application log. Enterprise Edition supports email, Slack, and webhook delivery channels.
Usage Forecasting
Enterprise Edition provides usage forecasting based on historical spending patterns. Forecasts project estimated end-of-period spend using a rolling average of the most recent 7 days:
forecast = await client.get_usage_forecast("team-budget")
print(f"Current spend: ${forecast.current_usd:.2f}")
print(f"Projected end-of-month: ${forecast.projected_usd:.2f}")
print(f"Budget limit: ${forecast.limit_usd:.2f}")
print(f"Risk level: {forecast.risk}") # low, medium, high, critical
Forecasts help teams adjust usage before budgets are exceeded rather than reacting after the fact.
Auto-Downgrade Behavior
When on_exceed is set to downgrade, AxonFlow automatically routes requests to a cheaper model once the budget threshold is reached. This avoids hard blocking while controlling costs:
| Original Model | Downgrade Target | Approximate Savings |
|---|---|---|
gpt-4 | gpt-4o-mini | ~95% |
claude-opus-4 | claude-haiku-4.5 | ~95% |
gpt-4o | gpt-4o-mini | ~90% |
The downgrade is transparent to the calling application. The budget_info in the response indicates when a downgrade occurred:
{
"budget_info": {
"budget_id": "team-platform-monthly",
"exceeded": true,
"on_exceed": "downgrade",
"downgraded_from": "gpt-4",
"downgraded_to": "gpt-4o-mini"
}
}
Auto-downgrade is an Enterprise feature. Community Edition supports warn and block actions only.
Cost Estimation
Added in v4.3.0.
AxonFlow can estimate the cost of a workflow plan before execution, helping teams decide whether to proceed, adjust parameters, or choose cheaper models. Two endpoints support this:
Estimate Cost Before Execution
Submit a plan definition to get a cost estimate without creating or running anything.
POST /api/v1/plans/estimate
Request Body:
{
"provider": "openai",
"model": "gpt-4",
"steps": [
{
"name": "analyze",
"type": "llm_call",
"estimated_tokens_in": 1000,
"estimated_tokens_out": 500
}
]
}
Response:
{
"estimated_cost_usd": 0.045,
"currency": "USD",
"breakdown": [
{
"step": "analyze",
"type": "llm_call",
"tokens_in": 1000,
"tokens_out": 500,
"cost_usd": 0.045
}
]
}
Get Cost Estimate for an Existing Plan
Retrieve a cost estimate for a plan that has already been created.
GET /api/v1/plans/{id}/cost
Response:
{
"plan_id": "plan_abc123",
"estimated_cost_usd": 0.045,
"currency": "USD",
"breakdown": [
{
"step": "analyze",
"type": "llm_call",
"tokens_in": 1000,
"tokens_out": 500,
"cost_usd": 0.045
}
]
}
Tiered Access
Cost estimation availability varies by edition:
| Feature | Community | Evaluation | Enterprise |
|---|---|---|---|
| Estimate response | Aggregate total only | Full per-step breakdown | Full per-step breakdown |
| Estimates per day | 10 | 100 | Unlimited |
SDK Example (Python)
estimate = await client.get_plan_cost(plan_id)
print(f"Estimated cost: ${estimate.estimated_cost_usd:.3f}")
for step in estimate.breakdown:
print(f" {step.step}: ${step.cost_usd:.3f}")
Community vs Enterprise
| Feature | Community | Enterprise |
|---|---|---|
| Usage tracking | ✅ | ✅ |
| Budget limits | ✅ | ✅ |
| Budget hierarchy | ✅ | ✅ |
| Alert thresholds | ✅ | ✅ |
| Pre-flight budget check | ✅ | ✅ |
| Usage breakdown | ✅ | ✅ |
| Usage forecast | ❌ | ✅ |
| Usage export | ❌ | ✅ |
| Alert channels (Slack, email, webhook) | ❌ | ✅ |
| Auto-downgrade to cheaper models | ❌ | ✅ |
| Budget rollover | ❌ | ✅ |
| Cost dashboard | ❌ | ✅ |
Next Steps
- Cost Controls API Reference - Complete API documentation
- Audit Logging - Track all AI interactions
- Policy-as-Code - Define governance rules