Skip to main content

LLM Provider Routing

AxonFlow supports server-side routing across multiple configured LLM providers. This is the page to read when you want to run more than one provider in production and control fallback, cost, latency, or hard provider pinning behavior.

What Routing Actually Controls

Routing applies to AxonFlow-managed provider calls such as:

  • Proxy Mode
  • MAP and related multi-step routed workflows

In Gateway Mode, your application still chooses and calls the provider directly. AxonFlow governs the request before and after that call, but it is not selecting the provider for you.

Provider Availability by Tier

ProviderCommunityEnterprise
OpenAI
Anthropic
Azure OpenAI
Gemini
Ollama
AWS Bedrock
Custom Provider

Server Configuration

Routing is configured on the AxonFlow side:

VariableValuesDefaultDescription
LLM_ROUTING_STRATEGYweighted, round_robin, failoverweightedCommunity and enterprise routing strategy
PROVIDER_WEIGHTSprovider:weight,...Equal weightsUsed by weighted routing
DEFAULT_LLM_PROVIDERProvider nameNonePrimary provider for failover
LLM_PROVIDERSprovider1,provider2,...All configuredComma-separated list of enabled provider names. Only providers in this list will be considered for routing. If not set, all configured providers are eligible.

Enterprise builds add:

VariableValuesDefaultDescription
LLM_ROUTING_STRATEGYcost_optimizedOff unless selectedEnterprise-only routing strategy
PROVIDER_COSTSprovider:cost,...Runtime defaultsUsed by cost-optimized routing

Routing Strategies

Weighted

LLM_ROUTING_STRATEGY=weighted
PROVIDER_WEIGHTS=openai:50,anthropic:30,gemini:20

Use this for gradual migrations, split traffic validation, and steady multi-provider production routing.

Round Robin

LLM_ROUTING_STRATEGY=round_robin

Use this when you want balanced distribution across healthy providers.

Failover

LLM_ROUTING_STRATEGY=failover
DEFAULT_LLM_PROVIDER=anthropic

Use this when one provider is primary and the others are operational backup paths.

Cost Optimized

LLM_ROUTING_STRATEGY=cost_optimized
PROVIDER_COSTS=ollama:0,anthropic:0.025,openai:0.03

Use this when you want the runtime to choose the cheapest healthy provider automatically. This strategy is enterprise-only.

Request-Level Hints

Clients can provide a provider and model preference in the request context:

const response = await client.proxyLLMCall({
userToken: 'user-123',
query: 'Draft a security review memo.',
requestType: 'chat',
context: {
provider: 'anthropic',
model: 'claude-sonnet-4-20250514',
},
});

By default:

  • context.provider is a preference
  • AxonFlow may still route elsewhere if the selected provider is unhealthy or the server strategy says otherwise

To hard-pin a request:

const response = await client.proxyLLMCall({
userToken: 'user-123',
query: 'Draft a security review memo.',
requestType: 'chat',
context: {
provider: 'anthropic',
model: 'claude-sonnet-4-20250514',
strict_provider: true,
},
});

When Teams Usually Need More Than One Provider

  • A cloud primary plus Ollama fallback for development or regulated environments
  • Anthropic for long-context reasoning, OpenAI for general-purpose chat
  • Azure OpenAI for production alignment with existing Azure controls, but a second provider for resilience
  • Bedrock added later as the company moves from pilot to governed enterprise rollout

Practical Guidance

  • Start with one provider and verify policy behavior first.
  • Add a second provider when you have a concrete reason: resilience, cost, region, or model fit.
  • Use soft preferences first and only enable strict pinning where the downstream workflow truly requires it.

Common Production Use Cases

  • OpenAI primary with Anthropic failover for customer-facing copilots
  • Ollama plus cloud fallback for private enterprise assistants
  • Azure OpenAI for Microsoft-centric deployment with a second provider for resilience
  • Bedrock for regulated AWS estates that need enterprise-only provider operations

Next Steps