LiteLLM + AxonFlow Integration

Prerequisites: Python 3.10+, AxonFlow running (Getting Started), pip install axonflow-litellm

Governance coverage

Interception point: the axonflow-litellm integration (AxonFlowLogger), in one of two modes — a blocking drop-in acompletion() wrapper, or an audit-only litellm.callbacks CustomLogger.
Governs (wrapper mode, can block): every completion is pre_check-ed on the request and denied via PolicyDeniedError — so prompt content is enforced — with optional HITL; both modes write the audit record.
Does not block (callback mode): the litellm.callbacks path is audit-only — LiteLLM swallows callback exceptions, so a policy deny is logged but cannot stop the call. Tool calls are not separately gated.

See Governance Architecture & Coverage for how this fits the five runtime modes.

What This Integration Gives You

LiteLLM lets you call 100+ LLM providers through a unified API. The axonflow-litellm package adds AxonFlow governance to every call — policy enforcement, human-in-the-loop approval gates, and audit trail — via a drop-in wrapper around litellm.completion().

Capability	How it surfaces
Pre-LLM policy enforcement	`pre_check` before every completion. Denials raise `PolicyDeniedError`.
HITL approval gate	When policy returns `require_approval`, the wrapper polls until approved/rejected/timed out.
Audit trail	Every LLM call is recorded to AxonFlow with model, provider, token usage, and latency.
Fail-open resilience	AxonFlow outage does not break LLM calls (default). Explicit `fail_open=False` for high-stakes workloads.
Circuit breaker	Half-open breaker prevents thundering herd on AxonFlow recovery.

The package reuses the published axonflow Python SDK (v8.2+) and inherits its auth, retry, and observability. There is no second HTTP client.

Quick Start

pip install axonflow-litellm

from axonflow_litellm import AxonFlowLogger, AxonFlowLoggerConfig, PolicyDeniedError

logger = AxonFlowLogger(AxonFlowLoggerConfig(
    endpoint="http://localhost:8080",
    client_id="my-app",
    client_secret="...",
))

try:
    response = logger.completion(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Summarize quarterly earnings"}],
    )
    print(response.choices[0].message.content)
except PolicyDeniedError as e:
    print(f"Blocked: {e.reason}")

All litellm.completion() kwargs are forwarded — model, messages, temperature, tools, etc.

Two Integration Modes

Governance Mode (recommended)

Use logger.completion() or logger.acompletion() as drop-in replacements:

# Sync
response = logger.completion(model="gpt-4o", messages=[...])

# Async (recommended for production)
response = await logger.acompletion(
    model="claude-sonnet-4-6",
    messages=[...],
    user_token="jwt-from-your-auth",
)

Flow: pre-check → HITL (if required) → LLM call → audit.

Audit-Only Mode

import litellm

litellm.callbacks = [logger]
response = await litellm.acompletion(model="gpt-4o", messages=[...])

Every call is recorded to AxonFlow. Policy denials are logged as warnings but cannot block the request (a LiteLLM SDK limitation — callback exceptions are silently swallowed).

HITL Approval Flow

When a policy returns require_approval:

The wrapper creates a HITL queue entry via create_hitl_request
Polls get_hitl_request every 2 seconds (configurable)
On approved → proceeds to LLM call
On rejected or expired → raises ApprovalRejected
On timeout (client-side deadline) → raises ApprovalTimeout

from axonflow_litellm import ApprovalRejected, ApprovalTimeout

try:
    response = logger.completion(model="gpt-4o", messages=[...])
except ApprovalTimeout:
    print("No reviewer responded in time")
except ApprovalRejected:
    print("Reviewer denied the request")
except PolicyDeniedError as e:
    print(f"Policy denied: {e.reason}")

Set enable_hitl_polling=False to deny immediately on require_approval without creating a queue entry.

Configuration

AxonFlowLoggerConfig(
    endpoint="http://localhost:8080",    # AxonFlow agent URL
    client_id="my-app",                 # AxonFlow client identifier
    client_secret="...",                # AxonFlow client secret

    # Governance behavior
    fail_open=True,                     # Allow LLM calls when AxonFlow is unreachable
    call_timeout_seconds=5.0,           # Per-hook timeout
    breaker_failure_threshold=5,        # Consecutive failures before circuit opens
    breaker_recovery_seconds=30.0,      # Recovery window before probe

    # HITL
    enable_hitl_polling=True,           # Enable HITL approval flow
    approval_poll_interval_seconds=2.0, # Polling interval
    approval_max_wait_seconds=300.0,    # Client-side deadline

    # Identity
    default_user_token="anonymous",     # Default token for policy evaluation
    tenant_id=None,                     # AxonFlow tenant identifier
    extra_context={},                   # Additional context sent with every pre-check
)

Fail-Open vs. Fail-Closed

Default (fail_open=True): AxonFlow timeout or outage → LLM call proceeds normally. Best for most workloads.

High-stakes (fail_open=False): AxonFlow timeout or outage → raises PolicyDeniedError. No LLM call proceeds without explicit policy approval.

Exceptions

Exception	When raised
`PolicyDeniedError`	Policy denied the request
`ApprovalRejected`	HITL reviewer rejected, or HITL row expired server-side
`ApprovalTimeout`	Client-side deadline exceeded while polling

All carry .reason (string) and .policies (list of policy IDs).

Using with an Existing AxonFlow Client

If your application already manages an AxonFlow client instance:

from axonflow import AxonFlow
from axonflow_litellm import AxonFlowLogger, AxonFlowLoggerConfig

client = AxonFlow(endpoint="http://localhost:8080", client_id="my-app")
logger = AxonFlowLogger.from_client(client, AxonFlowLoggerConfig(
    endpoint="http://localhost:8080",
    client_id="my-app",
))

The caller owns the client lifecycle — logger.aclose() is a no-op.

MCP Governance

LiteLLM is LLM-completion-focused. For MCP tool governance, see MCP Connectors Overview.

Resources

Implementation Checklist

Before you put this integration in front of real users, connect it back to the core runtime docs:

choose the request path in Choosing a Mode and Runtime Request Paths
map the policies that should apply in Policy Hierarchy
decide whether tool access belongs behind MCP Policy Enforcement
use Community vs Evaluation vs Enterprise when approval queues, evidence export, SSO, SCIM, or longer retention become rollout requirements

What This Integration Gives You​

Quick Start​

Two Integration Modes​

Governance Mode (recommended)​

Audit-Only Mode​

HITL Approval Flow​

Configuration​

Fail-Open vs. Fail-Closed​

Exceptions​

Using with an Existing AxonFlow Client​

MCP Governance​

Resources​

Implementation Checklist​