Skip to main content

LiteLLM + AxonFlow Integration

Prerequisites: Python 3.10+, AxonFlow running (Getting Started), pip install axonflow-litellm


What This Integration Gives You

LiteLLM lets you call 100+ LLM providers through a unified API. The axonflow-litellm package adds AxonFlow governance to every call — policy enforcement, human-in-the-loop approval gates, and audit trail — via a drop-in wrapper around litellm.completion().

CapabilityHow it surfaces
Pre-LLM policy enforcementpre_check before every completion. Denials raise PolicyDeniedError.
HITL approval gateWhen policy returns require_approval, the wrapper polls until approved/rejected/timed out.
Audit trailEvery LLM call is recorded to AxonFlow with model, provider, token usage, and latency.
Fail-open resilienceAxonFlow outage does not break LLM calls (default). Explicit fail_open=False for high-stakes workloads.
Circuit breakerHalf-open breaker prevents thundering herd on AxonFlow recovery.

The package reuses the published axonflow Python SDK (v8.2+) and inherits its auth, retry, and observability. There is no second HTTP client.


Quick Start

pip install axonflow-litellm
from axonflow_litellm import AxonFlowLogger, AxonFlowLoggerConfig, PolicyDeniedError

logger = AxonFlowLogger(AxonFlowLoggerConfig(
endpoint="http://localhost:8080",
client_id="my-app",
client_secret="...",
))

try:
response = logger.completion(
model="gpt-4o",
messages=[{"role": "user", "content": "Summarize quarterly earnings"}],
)
print(response.choices[0].message.content)
except PolicyDeniedError as e:
print(f"Blocked: {e.reason}")

All litellm.completion() kwargs are forwarded — model, messages, temperature, tools, etc.


Two Integration Modes

Use logger.completion() or logger.acompletion() as drop-in replacements:

# Sync
response = logger.completion(model="gpt-4o", messages=[...])

# Async (recommended for production)
response = await logger.acompletion(
model="claude-sonnet-4-6",
messages=[...],
user_token="jwt-from-your-auth",
)

Flow: pre-check → HITL (if required) → LLM call → audit.

Audit-Only Mode

Register as a LiteLLM callback for observability without blocking:

import litellm

litellm.callbacks = [logger]
response = await litellm.acompletion(model="gpt-4o", messages=[...])

Every call is recorded to AxonFlow. Policy denials are logged as warnings but cannot block the request (a LiteLLM SDK limitation — callback exceptions are silently swallowed).


HITL Approval Flow

When a policy returns require_approval:

  1. The wrapper creates a HITL queue entry via create_hitl_request
  2. Polls get_hitl_request every 2 seconds (configurable)
  3. On approved → proceeds to LLM call
  4. On rejected or expired → raises ApprovalRejected
  5. On timeout (client-side deadline) → raises ApprovalTimeout
from axonflow_litellm import ApprovalRejected, ApprovalTimeout

try:
response = logger.completion(model="gpt-4o", messages=[...])
except ApprovalTimeout:
print("No reviewer responded in time")
except ApprovalRejected:
print("Reviewer denied the request")
except PolicyDeniedError as e:
print(f"Policy denied: {e.reason}")

Set enable_hitl_polling=False to deny immediately on require_approval without creating a queue entry.


Configuration

AxonFlowLoggerConfig(
endpoint="http://localhost:8080", # AxonFlow agent URL
client_id="my-app", # AxonFlow client identifier
client_secret="...", # AxonFlow client secret

# Governance behavior
fail_open=True, # Allow LLM calls when AxonFlow is unreachable
call_timeout_seconds=5.0, # Per-hook timeout
breaker_failure_threshold=5, # Consecutive failures before circuit opens
breaker_recovery_seconds=30.0, # Recovery window before probe

# HITL
enable_hitl_polling=True, # Enable HITL approval flow
approval_poll_interval_seconds=2.0, # Polling interval
approval_max_wait_seconds=300.0, # Client-side deadline

# Identity
default_user_token="anonymous", # Default token for policy evaluation
tenant_id=None, # AxonFlow tenant identifier
extra_context={}, # Additional context sent with every pre-check
)

Fail-Open vs. Fail-Closed

Default (fail_open=True): AxonFlow timeout or outage → LLM call proceeds normally. Best for most workloads.

High-stakes (fail_open=False): AxonFlow timeout or outage → raises PolicyDeniedError. No LLM call proceeds without explicit policy approval.


Exceptions

ExceptionWhen raised
PolicyDeniedErrorPolicy denied the request
ApprovalRejectedHITL reviewer rejected, or HITL row expired server-side
ApprovalTimeoutClient-side deadline exceeded while polling

All carry .reason (string) and .policies (list of policy IDs).


Using with an Existing AxonFlow Client

If your application already manages an AxonFlow client instance:

from axonflow import AxonFlow
from axonflow_litellm import AxonFlowLogger, AxonFlowLoggerConfig

client = AxonFlow(endpoint="http://localhost:8080", client_id="my-app")
logger = AxonFlowLogger.from_client(client, AxonFlowLoggerConfig(
endpoint="http://localhost:8080",
client_id="my-app",
))

The caller owns the client lifecycle — logger.aclose() is a no-op.


MCP Governance

LiteLLM is LLM-completion-focused. For MCP tool governance, see MCP Connectors Overview.


Resources