Skip to main content

Cost Management

AxonFlow provides comprehensive cost management capabilities to help organizations control and optimize LLM spending. Set budgets at multiple levels, receive alerts before limits are reached, and enforce spending policies automatically.

Cost Controls Are Governance

In AxonFlow, budget limits are governance policies. Just like content policies block harmful requests, budget policies can block or downgrade requests that would exceed spending limits.


Why Cost Management Matters

AI agent deployments can quickly become expensive without proper controls:

  • Runaway Costs: A buggy agent loop can burn through API credits in minutes
  • Budget Allocation: Different teams/projects need separate spending limits
  • Visibility: Organizations need to know which agents/workflows cost the most
  • Accountability: Costs must be attributable to specific use cases
  • Forecasting: Monthly spend needs to be predictable

AxonFlow tracks every token, calculates costs in real-time, and enforces budget policies automatically.


Budget Hierarchy

Budgets can be set at multiple scopes, creating a hierarchy:

Organization Budget ($10,000/month)

├── Team: Platform ($5,000/month)
│ ├── Agent: code-reviewer ($1,000/month)
│ └── Agent: test-generator ($500/month)

├── Team: Data Science ($3,000/month)
│ └── Workflow: daily-analysis ($100/day)

└── Team: Customer Support ($2,000/month)
└── Agent: ticket-responder ($50/day)

Scope Types

ScopeUse Case
organizationOverall company spending limit
teamDepartment or project budget
agentIndividual AI agent budget
workflowSpecific workflow budget
userPer-user spending limit

Creating Budgets

Using the SDK

from axonflow import AxonFlow, CreateBudgetRequest, BudgetScope, BudgetPeriod, BudgetOnExceed

async with AxonFlow(endpoint="http://localhost:8081") as client:
# Organization-level monthly budget
await client.create_budget(CreateBudgetRequest(
id="org-monthly",
name="Organization Monthly Budget",
scope=BudgetScope.ORGANIZATION,
limit_usd=10000.0,
period=BudgetPeriod.MONTHLY,
on_exceed=BudgetOnExceed.BLOCK,
alert_thresholds=[50, 80, 90, 100]
))

# Team budget (nested under org)
await client.create_budget(CreateBudgetRequest(
id="platform-team",
name="Platform Team Budget",
scope=BudgetScope.TEAM,
scope_id="platform",
limit_usd=5000.0,
period=BudgetPeriod.MONTHLY,
on_exceed=BudgetOnExceed.WARN,
alert_thresholds=[50, 80, 100]
))

# Agent budget (daily limit)
await client.create_budget(CreateBudgetRequest(
id="code-reviewer-daily",
name="Code Reviewer Daily Limit",
scope=BudgetScope.AGENT,
scope_id="code-reviewer",
limit_usd=50.0,
period=BudgetPeriod.DAILY,
on_exceed=BudgetOnExceed.BLOCK,
alert_thresholds=[80, 100]
))

Budget Periods

PeriodReset Frequency
dailyEvery day at midnight UTC
weeklyEvery Monday at midnight UTC
monthlyFirst of each month at midnight UTC
quarterlyFirst of each quarter
yearlyJanuary 1st

Enforcement Actions

When a budget threshold is reached, AxonFlow takes action based on the on_exceed setting:

ActionBehavior
warnLog warning, send alert, allow request
blockReject request with budget exceeded error
downgradeSwitch to a cheaper model (Enterprise)

Example: Blocking When Budget Exceeded

# Check budget before making LLM request
decision = await client.check_budget(BudgetCheckRequest(
team_id="platform",
agent_id="code-reviewer"
))

if decision.allowed:
# Safe to make LLM request
response = await llm.generate(prompt)
else:
# Budget exceeded
print(f"Blocked: {decision.message}")
print(f"Used: ${decision.used_usd:.2f} / ${decision.limit_usd:.2f}")

Alert Thresholds

Configure percentage thresholds to receive alerts before budgets are exceeded:

await client.create_budget(CreateBudgetRequest(
id="team-budget",
name="Engineering Team",
scope=BudgetScope.TEAM,
scope_id="engineering",
limit_usd=5000.0,
period=BudgetPeriod.MONTHLY,
on_exceed=BudgetOnExceed.BLOCK,
alert_thresholds=[50, 80, 90, 100] # Alert at 50%, 80%, 90%, 100%
))

Viewing Alerts

# Get alerts for a budget
alerts = await client.get_budget_alerts("team-budget")
for alert in alerts.alerts:
print(f"[{alert.created_at}] {alert.message}")
print(f" Threshold: {alert.threshold}%")
print(f" Amount: ${alert.amount_usd:.2f}")

Monitoring Usage

Real-Time Budget Status

status = await client.get_budget_status("team-budget")
print(f"Budget: {status.budget.name}")
print(f"Used: ${status.used_usd:.2f} / ${status.budget.limit_usd:.2f}")
print(f"Remaining: ${status.remaining_usd:.2f}")
print(f"Percentage: {status.percentage:.1f}%")
print(f"Period: {status.period_start} to {status.period_end}")
if status.is_exceeded:
print("WARNING: Budget exceeded!")

Usage Summary

usage = await client.get_usage_summary(period="monthly")
print(f"Total Cost: ${usage.total_cost_usd:.2f}")
print(f"Total Requests: {usage.total_requests:,}")
print(f"Tokens In: {usage.total_tokens_in:,}")
print(f"Tokens Out: {usage.total_tokens_out:,}")

Usage Breakdown

Analyze spending by different dimensions:

# By provider
by_provider = await client.get_usage_breakdown("provider", "monthly")
for item in by_provider.items:
print(f"{item.name}: ${item.cost_usd:.2f} ({item.percentage:.1f}%)")

# By model
by_model = await client.get_usage_breakdown("model", "monthly")

# By team
by_team = await client.get_usage_breakdown("team", "monthly")

# By agent
by_agent = await client.get_usage_breakdown("agent", "monthly")

Best Practices

1. Start with Organization Budget

Always set a top-level organization budget as a safety net:

await client.create_budget(CreateBudgetRequest(
id="org-safety-net",
name="Organization Safety Net",
scope=BudgetScope.ORGANIZATION,
limit_usd=50000.0, # High limit as safety net
period=BudgetPeriod.MONTHLY,
on_exceed=BudgetOnExceed.BLOCK,
alert_thresholds=[80, 90, 95, 100]
))

2. Use Daily Limits for Agents

Agents can have runaway loops. Daily limits catch issues quickly:

await client.create_budget(CreateBudgetRequest(
id="agent-daily",
scope=BudgetScope.AGENT,
scope_id="my-agent",
limit_usd=100.0, # $100/day max
period=BudgetPeriod.DAILY,
on_exceed=BudgetOnExceed.BLOCK
))

3. Pre-Check Before Expensive Operations

Always check budgets before calling expensive models:

async def safe_llm_call(prompt: str, team_id: str):
# Check budget first
decision = await client.check_budget(BudgetCheckRequest(team_id=team_id))

if not decision.allowed:
raise BudgetExceededError(decision.message)

# Safe to proceed
return await llm.generate(prompt)

4. Set Meaningful Alert Thresholds

Configure alerts that give you time to react:

  • 50%: Early warning, halfway through budget
  • 80%: Time to review spending patterns
  • 90%: Consider reducing usage or increasing budget
  • 100%: Budget exhausted

Community vs Enterprise

FeatureCommunityEnterprise
Usage tracking
Budget limits
Budget hierarchy
Alert thresholds
Pre-flight budget check
Usage breakdown
Usage forecast
Usage export
Alert channels (Slack, email, webhook)
Auto-downgrade to cheaper models
Budget rollover
Cost dashboard

Next Steps