Skip to main content

LLM Provider Routing

Configure how AxonFlow routes requests across LLM providers using simple environment variables.

Overview

AxonFlow's provider routing allows operators to:

  • Optimize costs by preferring cheaper providers
  • Meet compliance requirements (e.g., HIPAA with Bedrock-only routing)
  • Improve performance by favoring faster providers
  • Configure failover for high availability

Quick Start

Set environment variables on the AxonFlow Orchestrator:

# Force all traffic through Bedrock (HIPAA compliance)
LLM_ROUTING_STRATEGY=failover
DEFAULT_LLM_PROVIDER=bedrock

# Or distribute traffic by weight
LLM_ROUTING_STRATEGY=weighted
PROVIDER_WEIGHTS=bedrock:60,anthropic:30,openai:10

Configuration

Environment Variables

VariableValuesDefaultDescription
LLM_ROUTING_STRATEGYweighted, round_robin, failover, cost_optimized*weightedRouting strategy
PROVIDER_WEIGHTSprovider:weight,...Equal weightsTraffic distribution
DEFAULT_LLM_PROVIDERbedrock, openai, etc.NonePrimary provider for failover
PROVIDER_COSTS*provider:cost,...See defaultsCost per 1K tokens for cost_optimized

* Enterprise only

Docker Compose

services:
axonflow-orchestrator:
environment:
# Routing configuration
LLM_ROUTING_STRATEGY: ${LLM_ROUTING_STRATEGY:-weighted}
PROVIDER_WEIGHTS: ${PROVIDER_WEIGHTS:-}
DEFAULT_LLM_PROVIDER: ${DEFAULT_LLM_PROVIDER:-}

# Provider API keys
OPENAI_API_KEY: ${OPENAI_API_KEY:-}
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}

Routing Strategies

Weighted (Default)

Distributes requests based on configured weights. Weights are normalized to sum to 1.0.

PROVIDER_WEIGHTS=openai:50,anthropic:30,bedrock:20
# Results in: ~50% OpenAI, ~30% Anthropic, ~20% Bedrock

Use cases:

  • Gradual migration between providers
  • Cost optimization by favoring cheaper providers
  • A/B testing new providers

Round Robin

Cycles through healthy providers equally, ignoring weights.

LLM_ROUTING_STRATEGY=round_robin
# Cycles: openai -> anthropic -> bedrock -> openai -> ...

Use cases:

  • Even load distribution
  • Testing provider parity
  • Avoiding rate limits

Failover

Uses primary provider exclusively, falls back to others only on failure.

LLM_ROUTING_STRATEGY=failover
DEFAULT_LLM_PROVIDER=bedrock
# Always uses Bedrock, falls back to others if Bedrock unhealthy

Use cases:

  • HIPAA/compliance (Bedrock-only)
  • Primary/backup configuration
  • Predictable routing

Cost Optimized (Enterprise)

Automatically routes to the cheapest healthy provider based on configured cost per 1K tokens.

LLM_ROUTING_STRATEGY=cost_optimized
PROVIDER_COSTS=ollama:0,bedrock:0.02,anthropic:0.025,openai:0.03

Default costs (if PROVIDER_COSTS not set):

ProviderCost per 1K tokens
ollama$0.00 (self-hosted)
bedrock$0.02
gemini$0.025
anthropic$0.025
openai$0.03

Use cases:

  • Cost optimization without manual weight tuning
  • Automatic selection of cheapest healthy provider
  • Hybrid deployments with local (Ollama) + cloud fallback

How it works:

  1. On each request, evaluates all healthy providers
  2. Selects the provider with the lowest configured cost
  3. Providers without cost config default to $1.00 (deprioritized)
  4. Falls back to first healthy provider if all costs equal

Examples

HIPAA Compliance (Healthcare)

Force all traffic through AWS Bedrock:

LLM_ROUTING_STRATEGY=failover
DEFAULT_LLM_PROVIDER=bedrock
PROVIDER_WEIGHTS=bedrock:100

Cost Optimization (Community)

Prefer cheaper providers via manual weights:

LLM_ROUTING_STRATEGY=weighted
PROVIDER_WEIGHTS=bedrock:60,anthropic:30,openai:10

Cost Optimization (Enterprise)

Automatic selection of cheapest healthy provider:

LLM_ROUTING_STRATEGY=cost_optimized
PROVIDER_COSTS=ollama:0,bedrock:0.02,anthropic:0.025,openai:0.03
# Automatically uses cheapest healthy provider

High Availability

Round-robin with automatic failover on unhealthy providers:

LLM_ROUTING_STRATEGY=round_robin
# Unhealthy providers automatically skipped

SDK Usage

Clients can provide hints for provider selection, but the server makes final routing decisions.

Request-Level Pinning (Advanced)

Per-request provider semantics are:

  • context.provider → provider preference (fallback allowed).
  • context.strict_provider=true → hard pin for that request (no fallback).

You can also set a server default:

LLM_STRICT_PROVIDER_DEFAULT=false  # Recommended default

When set to true, strict pinning is enabled by default unless a request explicitly sets context.strict_provider=false.

TypeScript

import { AxonFlow } from "@axonflow/sdk";

const client = new AxonFlow({
endpoint: "http://localhost:8080",
tenant: "demo",
});

// Default routing (server decides)
const response = await client.proxy({
query: "What is 2 + 2?",
requestType: "chat",
});

// Request specific provider (hint)
const bedrockResponse = await client.proxy({
query: "What is the capital of France?",
requestType: "chat",
context: {
provider: "bedrock", // Provider preference
strict_provider: true, // Optional hard pin
model: "anthropic.claude-haiku-4-5-20251001-v1:0",
},
});

// Check provider health
const health = await client.health();
console.log("Providers:", health.providers);

Python

import asyncio
from axonflow import AxonFlow

async def main():
client = AxonFlow(
endpoint="http://localhost:8080",
tenant="demo",
)

# Default routing (server decides)
response = await client.proxy(
query="What is 2 + 2?",
request_type="chat",
)

# Request specific provider (hint)
bedrock_response = await client.proxy(
query="What is the capital of France?",
request_type="chat",
context={"provider": "bedrock"},
)

# Check provider health
health = await client.health()
print("Status:", health.status)

asyncio.run(main())

Go

package main

import (
"context"
"fmt"
"log"

"github.com/getaxonflow/axonflow-sdk-go/v3"
)

func main() {
client, _ := axonflow.NewClient(
axonflow.WithEndpoint("http://localhost:8080"),
axonflow.WithTenant("demo"),
)

ctx := context.Background()

// Default routing (server decides)
resp, _ := client.Proxy(ctx, &axonflow.ProxyRequest{
Query: "What is 2 + 2?",
RequestType: "chat",
})
fmt.Println("Response:", resp.Response)

// Request specific provider (hint)
bedrockResp, _ := client.Proxy(ctx, &axonflow.ProxyRequest{
Query: "What is the capital of France?",
RequestType: "chat",
Context: map[string]interface{}{
"provider": "bedrock",
},
})
fmt.Println("Provider:", bedrockResp.Metadata["provider"])
}

Java

import com.getaxonflow.sdk.AxonFlow;
import com.getaxonflow.sdk.ProxyRequest;
import com.getaxonflow.sdk.RequestType;
import java.util.Map;

public class ProviderRouting {
public static void main(String[] args) {
var client = new AxonFlow(
AxonFlowConfig.builder()
.endpoint("http://localhost:8080")
.tenant("demo")
.build()
);

// Default routing (server decides)
var response = client.proxy(ProxyRequest.builder()
.query("What is 2 + 2?")
.requestType(RequestType.CHAT)
.build());

// Request specific provider (hint)
var bedrockResponse = client.proxy(ProxyRequest.builder()
.query("What is the capital of France?")
.requestType(RequestType.CHAT)
.context(Map.of("provider", "bedrock"))
.build());
}
}

Health Checking

Check provider health via the health endpoint:

curl http://localhost:8080/health

Response includes provider status:

{
"status": "healthy",
"providers": {
"openai": {"healthy": true},
"anthropic": {"healthy": true},
"bedrock": {"healthy": true}
}
}

Unhealthy providers are automatically excluded from routing.

Automatic Failover

When a provider fails:

  1. Request retried with next available provider
  2. After threshold failures, provider marked unhealthy
  3. Traffic routes to healthy providers only
  4. Health checks restore provider when recovered

Community vs Enterprise

FeatureCommunityEnterprise
Routing strategiesweighted, round_robin, failover+ cost_optimized
ConfigurationEnvironment variables+ Customer Portal UI
ScopeGlobal (all tenants)Per-tenant settings
Cost tracking-Per-provider cost metrics
Runtime configurationRestart requiredHot reload via API

For per-tenant configuration and advanced routing, see Enterprise Provider Features.

Complete Examples

Full working examples are available in the AxonFlow repository:

SDK Examples

HTTP/curl Example

For users who prefer direct HTTP calls without an SDK:

# Quick test with curl
curl -X POST http://localhost:8080/api/request \
-H "Content-Type: application/json" \
-d '{
"query": "What is 2 + 2?",
"user_token": "demo-user",
"client_id": "curl-example",
"request_type": "llm_chat",
"context": {"provider": "openai"}
}'

Each example demonstrates default routing, provider hints, model overrides, and health checking.