Skip to main content

LangChain + AxonFlow Integration

Prerequisites: Python 3.9+, AxonFlow running (Getting Started), pip install langchain langchain-openai axonflow


What LangChain Does Well

LangChain is the most popular framework for building LLM applications—with 847 million downloads and growing. Its strengths are real and substantial:

Unified Model Interface: Switch between OpenAI, Anthropic, Google, Bedrock, and Ollama without rewriting your application. This abstraction saves weeks of integration work.

Rapid Prototyping: Build a working agent in under 10 lines of code. LCEL (LangChain Expression Language) makes composing chains intuitive.

Rich Ecosystem: 1,400+ integrations—vector stores, document loaders, retrievers, tools. If you need a connector, it probably exists.

RAG Made Simple: Retrieval-Augmented Generation with sensible defaults. Load documents, create embeddings, query with context—all handled.

LangGraph for State: When you outgrow simple chains, LangGraph provides durable execution, human-in-the-loop patterns, and persistence.

Active Community: Extensive documentation, tutorials, and community support. Issues get addressed quickly.


What LangChain Doesn't Try to Solve

LangChain focuses on making LLM applications easy to build. These concerns are explicitly out of scope:

Production RequirementLangChain's Position
Policy enforcement at inference timeNot provided—no built-in way to block requests based on content, user, or context
PII detection and blockingRequires external integration (Presidio)—not built in
SQL injection preventionNot addressed—must implement separately
Per-user cost allocationNot provided—no way to attribute costs to users or departments
Audit trails for complianceRequires LangSmith (paid SaaS) or third-party tools
Role-based access controlNot addressed—no permission model
Request/response loggingOff by default—must configure manually

This isn't a criticism—it's a design choice. LangChain handles orchestration. Governance is a separate concern.


Where Teams Hit Production Friction

Based on real enterprise deployments, here are the blockers that appear after the prototype works:

1. The Compliance Audit

"Show me every prompt that contained customer identifiers in the last 90 days."

LangChain doesn't log prompts by default. LangSmith does, but if it wasn't enabled from day one, you're reconstructing from application logs. For HIPAA, GDPR, or SOX audits, this is painful.

2. The $30K Invoice

A recursive loop triggers 47,000 API calls over a weekend. The bill arrives. Debugging takes three days because:

  • No per-request logging was enabled
  • No cost attribution per user or session
  • No circuit breaker on spend

LangChain processed every request as intended. Nothing was watching what it processed.

3. The "Why Did It Say That?" Question

A hallucinated figure appears in a financial report. Compliance asks:

  • What was the original prompt?
  • What context was retrieved?
  • What model and temperature were used?
  • Who was the requesting user?

This information wasn't captured. LangChain returned a response; what happened between prompt and response was invisible.

4. The Security Review Block

Security review: BLOCKED
- No audit trail for LLM decisions
- PII exposure risk in prompts
- SQL injection not prevented
- Policy enforcement unclear
- Cost controls missing

The prototype worked perfectly. It can't ship.

5. The PII Leak That Wasn't (But Could Have Been)

A support agent was trained on customer service logs. During a security scan, it could be prompted to retrieve customer phone numbers—information that was in the training data. LangChain has no built-in PII detection.


How AxonFlow Plugs In

AxonFlow doesn't replace LangChain. It sits underneath it—providing the governance layer that LangChain intentionally doesn't include:

┌─────────────────┐
│ Your App │
└────────┬────────┘

v
┌─────────────────┐
│ LangChain │ <-- Chains, Agents, RAG
└────────┬────────┘

v
┌─────────────────────────────────┐
│ AxonFlow │
│ ┌───────────┐ ┌────────────┐ │
│ │ Policy │ │ Audit │ │
│ │ Enforce │ │ Trail │ │
│ └───────────┘ └────────────┘ │
│ ┌───────────┐ ┌────────────┐ │
│ │ PII │ │ Cost │ │
│ │ Detection│ │ Control │ │
│ └───────────┘ └────────────┘ │
└────────────────┬────────────────┘

v
┌─────────────────┐
│ LLM Provider │
└─────────────────┘

What this gives you:

  • Every request logged with user identity and context
  • PII detected and blocked before hitting the LLM (SSN, credit cards, etc.)
  • SQL injection attempts blocked
  • Cost tracked per user, per department, per agent
  • Compliance auditors can query 90 days of decisions in seconds

What stays the same:

  • Your LangChain code doesn't change
  • No new abstractions to learn
  • No framework lock-in

Integration Patterns

Recommended default for most teams. This is the standard pattern—check every request before the LLM, audit after:

from langchain_openai import ChatOpenAI
from axonflow import AxonFlow, TokenUsage
import time

def governed_langchain_call(
user_token: str,
query: str,
context: dict = None
) -> str:
"""LangChain call with AxonFlow governance."""
start_time = time.time()

with AxonFlow.sync(
endpoint="http://localhost:8080",
client_id="langchain-app",
client_secret="your-secret"
) as axonflow:
# 1. Pre-check: Is this request allowed?
ctx = axonflow.get_policy_approved_context(
user_token=user_token,
query=query,
context={**(context or {}), "framework": "langchain"}
)

if not ctx.approved:
return f"Blocked: {ctx.block_reason}"
# Logged: who, what, when, why blocked

# 2. Your existing LangChain code
llm = ChatOpenAI(model="gpt-4")
response = llm.invoke(query)
latency_ms = int((time.time() - start_time) * 1000)

# 3. Audit: Record what happened
axonflow.audit_llm_call(
context_id=ctx.context_id,
response_summary=response.content[:200],
provider="openai",
model="gpt-4",
token_usage=TokenUsage(
prompt_tokens=response.usage_metadata.get("input_tokens", 0),
completion_tokens=response.usage_metadata.get("output_tokens", 0),
total_tokens=response.usage_metadata.get("total_tokens", 0)
),
latency_ms=latency_ms
)

return response.content
Pattern 2: Governed RAG PipelineAdvanced: For RAG systems

Add governance to retrieval-augmented generation:

from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
from axonflow import AxonFlow, TokenUsage
import time

class GovernedRAG:
def __init__(self, vectorstore_path: str):
self.embeddings = OpenAIEmbeddings()
self.vectorstore = Chroma(
persist_directory=vectorstore_path,
embedding_function=self.embeddings
)
self.llm = ChatOpenAI(model="gpt-4")
self.chain = RetrievalQA.from_chain_type(
llm=self.llm,
retriever=self.vectorstore.as_retriever(),
return_source_documents=True
)

def query(self, user_token: str, question: str, context: dict = None) -> dict:
start_time = time.time()

with AxonFlow.sync(
endpoint="http://localhost:8080",
client_id="langchain-rag",
client_secret="your-secret"
) as axonflow:
# Pre-check with RAG context
ctx = axonflow.get_policy_approved_context(
user_token=user_token,
query=question,
context={
**(context or {}),
"pipeline": "rag",
"data_tier": context.get("data_tier", "standard")
}
)

if not ctx.approved:
return {"answer": f"Blocked: {ctx.block_reason}", "blocked": True}

# Execute RAG
result = self.chain({"query": question})
sources = [doc.metadata for doc in result.get("source_documents", [])]
latency_ms = int((time.time() - start_time) * 1000)

# Audit with source attribution
axonflow.audit_llm_call(
context_id=ctx.context_id,
response_summary=result["result"][:200],
provider="openai",
model="gpt-4",
token_usage=TokenUsage(prompt_tokens=100, completion_tokens=50, total_tokens=150),
latency_ms=latency_ms,
metadata={"source_count": len(sources), "sources": sources[:5]}
)

return {"answer": result["result"], "sources": sources, "context_id": ctx.context_id}
Pattern 3: Proxy ModeAlternative: For simpler deployments

Route all LLM calls through AxonFlow for automatic governance:

from axonflow import AxonFlow

# All governance happens automatically
with AxonFlow.sync(
endpoint="http://localhost:8080",
client_id="langchain-app"
) as axonflow:
result = axonflow.execute_query(
user_token="user-123",
query="What are the key AI governance regulations?",
request_type="chat",
context={"framework": "langchain", "department": "research"}
)

if result.blocked:
print(f"Blocked: {result.block_reason}")
else:
print(result.data) # LLM response

Other Languages: Go, TypeScript, JavaFor polyglot or service-oriented architectures

Go SDK Integration

For Go services orchestrating LangChain or hybrid architectures:

package main

import (
"context"
"fmt"
"time"

"github.com/getaxonflow/axonflow-sdk-go"
"github.com/sashabaranov/go-openai"
)

func GovernedLangChainCall(
ctx context.Context,
client *axonflow.AxonFlowClient,
openai *openai.Client,
userToken, query string,
callContext map[string]interface{},
) (string, error) {
startTime := time.Now()

// Pre-check
callContext["framework"] = "langchain"
result, err := client.ExecuteQuery(userToken, query, "chat", callContext)
if err != nil {
return "", err
}

if result.Blocked {
return "", fmt.Errorf("blocked: %s", result.BlockReason)
}

// Make LLM call
resp, err := openai.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: openai.GPT4,
Messages: []openai.ChatCompletionMessage{
{Role: "user", Content: query},
},
})
if err != nil {
return "", err
}

return resp.Choices[0].Message.Content, nil
}

TypeScript SDK Integration

import { AxonFlow } from "@axonflow/sdk";
import { ChatOpenAI } from "@langchain/openai";

async function governedLangChainCall(
userToken: string,
query: string,
context?: Record<string, unknown>
): Promise<string> {
const axonflow = new AxonFlow({
endpoint: "http://localhost:8080",
tenant: "langchain-app",
});

const startTime = Date.now();

// Pre-check
const approval = await axonflow.getPolicyApprovedContext({
userToken,
query,
context: { ...context, framework: "langchain" },
});

if (!approval.approved) {
throw new Error(`Blocked: ${approval.blockReason}`);
}

// LangChain call
const llm = new ChatOpenAI({ model: "gpt-4" });
const response = await llm.invoke(query);
const latencyMs = Date.now() - startTime;

// Audit
await axonflow.auditLLMCall({
contextId: approval.contextId,
responseSummary: response.content.slice(0, 200),
provider: "openai",
model: "gpt-4",
tokenUsage: { promptTokens: 100, completionTokens: 50, totalTokens: 150 },
latencyMs,
});

return response.content;
}

Java SDK Integration

import com.getaxonflow.sdk.AxonFlow;
import com.getaxonflow.sdk.AxonFlowConfig;
import com.getaxonflow.sdk.types.*;

public class GovernedLangChain {
private final AxonFlow axonflow;

public GovernedLangChain() {
this.axonflow = AxonFlow.create(AxonFlowConfig.builder()
.endpoint("http://localhost:8080")
.clientId("langchain-java")
.clientSecret(System.getenv("AXONFLOW_CLIENT_SECRET"))
.build());
}

public String governedCall(String userToken, String query, Map<String, Object> context)
throws Exception {

context.put("framework", "langchain");
long startTime = System.currentTimeMillis();

// Pre-check
PolicyApprovalResult approval = axonflow.getPolicyApprovedContext(
PolicyApprovalRequest.builder()
.userToken(userToken)
.query(query)
.context(context)
.build()
);

if (!approval.isApproved()) {
throw new PolicyViolationException(approval.getBlockReason());
}

// Your LLM call here
String response = callLLM(query);
long latencyMs = System.currentTimeMillis() - startTime;

// Audit
axonflow.auditLLMCall(AuditOptions.builder()
.contextId(approval.getContextId())
.responseSummary(response.substring(0, Math.min(200, response.length())))
.provider("openai")
.model("gpt-4")
.tokenUsage(TokenUsage.of(100, 50))
.latencyMs(latencyMs)
.build());

return response;
}
}

Example Implementations

LanguageSDKExample
Pythonaxonflowhello-world/python
Goaxonflow-sdk-gohello-world/go
TypeScript@axonflow/sdkhello-world/typescript
Javaaxonflow-sdkhello-world/java
HTTP/curlRaw HTTPhello-world/http (for PHP, Ruby, Rust, etc.)

Best Practices

Always use context IDs: The context_id from get_policy_approved_context() must be passed to audit_llm_call() for proper correlation.

Handle blocked requests gracefully: Check ctx.approved before making LLM calls. Return user-friendly messages when blocked.

Always audit, even on errors: Wrap LLM calls in try/except and call audit_llm_call() in both success and error paths.

Use meaningful context: Pass relevant metadata (department, use_case, data_tier) to enable fine-grained policies.

Troubleshooting
IssueSolution
Pre-check returns 401Verify client_secret is correct
Audit calls failingCheck context_id is from valid pre-check (not expired)
High latencyAxonFlow adds under 10ms; if higher, check network to AxonFlow endpoint
Policies not applyingVerify context fields match policy conditions; check AxonFlow logs