LangChain + AxonFlow Integration
Prerequisites: Python 3.9+, AxonFlow running (Getting Started), pip install langchain langchain-openai axonflow
What LangChain Does Well
LangChain is the most popular framework for building LLM applications—with 847 million downloads and growing. Its strengths are real and substantial:
Unified Model Interface: Switch between OpenAI, Anthropic, Google, Bedrock, and Ollama without rewriting your application. This abstraction saves weeks of integration work.
Rapid Prototyping: Build a working agent in under 10 lines of code. LCEL (LangChain Expression Language) makes composing chains intuitive.
Rich Ecosystem: 1,400+ integrations—vector stores, document loaders, retrievers, tools. If you need a connector, it probably exists.
RAG Made Simple: Retrieval-Augmented Generation with sensible defaults. Load documents, create embeddings, query with context—all handled.
LangGraph for State: When you outgrow simple chains, LangGraph provides durable execution, human-in-the-loop patterns, and persistence.
Active Community: Extensive documentation, tutorials, and community support. Issues get addressed quickly.
What LangChain Doesn't Try to Solve
LangChain focuses on making LLM applications easy to build. These concerns are explicitly out of scope:
| Production Requirement | LangChain's Position |
|---|---|
| Policy enforcement at inference time | Not provided—no built-in way to block requests based on content, user, or context |
| PII detection and blocking | Requires external integration (Presidio)—not built in |
| SQL injection prevention | Not addressed—must implement separately |
| Per-user cost allocation | Not provided—no way to attribute costs to users or departments |
| Audit trails for compliance | Requires LangSmith (paid SaaS) or third-party tools |
| Role-based access control | Not addressed—no permission model |
| Request/response logging | Off by default—must configure manually |
This isn't a criticism—it's a design choice. LangChain handles orchestration. Governance is a separate concern.
Where Teams Hit Production Friction
Based on real enterprise deployments, here are the blockers that appear after the prototype works:
1. The Compliance Audit
"Show me every prompt that contained customer identifiers in the last 90 days."
LangChain doesn't log prompts by default. LangSmith does, but if it wasn't enabled from day one, you're reconstructing from application logs. For HIPAA, GDPR, or SOX audits, this is painful.
2. The $30K Invoice
A recursive loop triggers 47,000 API calls over a weekend. The bill arrives. Debugging takes three days because:
- No per-request logging was enabled
- No cost attribution per user or session
- No circuit breaker on spend
LangChain processed every request as intended. Nothing was watching what it processed.
3. The "Why Did It Say That?" Question
A hallucinated figure appears in a financial report. Compliance asks:
- What was the original prompt?
- What context was retrieved?
- What model and temperature were used?
- Who was the requesting user?
This information wasn't captured. LangChain returned a response; what happened between prompt and response was invisible.
4. The Security Review Block
Security review: BLOCKED
- No audit trail for LLM decisions
- PII exposure risk in prompts
- SQL injection not prevented
- Policy enforcement unclear
- Cost controls missing
The prototype worked perfectly. It can't ship.
5. The PII Leak That Wasn't (But Could Have Been)
A support agent was trained on customer service logs. During a security scan, it could be prompted to retrieve customer phone numbers—information that was in the training data. LangChain has no built-in PII detection.
How AxonFlow Plugs In
AxonFlow doesn't replace LangChain. It sits underneath it—providing the governance layer that LangChain intentionally doesn't include:
┌─────────────────┐
│ Your App │
└────────┬────────┘
│
v
┌─────────────────┐
│ LangChain │ <-- Chains, Agents, RAG
└────────┬────────┘
│
v
┌─────────────────────────────────┐
│ AxonFlow │
│ ┌───────────┐ ┌────────────┐ │
│ │ Policy │ │ Audit │ │
│ │ Enforce │ │ Trail │ │
│ └───────────┘ └────────────┘ │
│ ┌───────────┐ ┌────────────┐ │
│ │ PII │ │ Cost │ │
│ │ Detection│ │ Control │ │
│ └───────────┘ └────────────┘ │
└────────────────┬────────────────┘
│
v
┌─────────────────┐
│ LLM Provider │
└─────────────────┘
What this gives you:
- Every request logged with user identity and context
- PII detected and blocked before hitting the LLM (SSN, credit cards, etc.)
- SQL injection attempts blocked
- Cost tracked per user, per department, per agent
- Compliance auditors can query 90 days of decisions in seconds
What stays the same:
- Your LangChain code doesn't change
- No new abstractions to learn
- No framework lock-in
Integration Patterns
Pattern 1: Pre-Check + Audit (Gateway Mode) — Recommended
Recommended default for most teams. This is the standard pattern—check every request before the LLM, audit after:
from langchain_openai import ChatOpenAI
from axonflow import AxonFlow, TokenUsage
import time
def governed_langchain_call(
user_token: str,
query: str,
context: dict = None
) -> str:
"""LangChain call with AxonFlow governance."""
start_time = time.time()
with AxonFlow.sync(
endpoint="http://localhost:8080",
client_id="langchain-app",
client_secret="your-secret"
) as axonflow:
# 1. Pre-check: Is this request allowed?
ctx = axonflow.get_policy_approved_context(
user_token=user_token,
query=query,
context={**(context or {}), "framework": "langchain"}
)
if not ctx.approved:
return f"Blocked: {ctx.block_reason}"
# Logged: who, what, when, why blocked
# 2. Your existing LangChain code
llm = ChatOpenAI(model="gpt-4")
response = llm.invoke(query)
latency_ms = int((time.time() - start_time) * 1000)
# 3. Audit: Record what happened
axonflow.audit_llm_call(
context_id=ctx.context_id,
response_summary=response.content[:200],
provider="openai",
model="gpt-4",
token_usage=TokenUsage(
prompt_tokens=response.usage_metadata.get("input_tokens", 0),
completion_tokens=response.usage_metadata.get("output_tokens", 0),
total_tokens=response.usage_metadata.get("total_tokens", 0)
),
latency_ms=latency_ms
)
return response.content
Pattern 2: Governed RAG Pipeline — Advanced: For RAG systems
Add governance to retrieval-augmented generation:
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
from axonflow import AxonFlow, TokenUsage
import time
class GovernedRAG:
def __init__(self, vectorstore_path: str):
self.embeddings = OpenAIEmbeddings()
self.vectorstore = Chroma(
persist_directory=vectorstore_path,
embedding_function=self.embeddings
)
self.llm = ChatOpenAI(model="gpt-4")
self.chain = RetrievalQA.from_chain_type(
llm=self.llm,
retriever=self.vectorstore.as_retriever(),
return_source_documents=True
)
def query(self, user_token: str, question: str, context: dict = None) -> dict:
start_time = time.time()
with AxonFlow.sync(
endpoint="http://localhost:8080",
client_id="langchain-rag",
client_secret="your-secret"
) as axonflow:
# Pre-check with RAG context
ctx = axonflow.get_policy_approved_context(
user_token=user_token,
query=question,
context={
**(context or {}),
"pipeline": "rag",
"data_tier": context.get("data_tier", "standard")
}
)
if not ctx.approved:
return {"answer": f"Blocked: {ctx.block_reason}", "blocked": True}
# Execute RAG
result = self.chain({"query": question})
sources = [doc.metadata for doc in result.get("source_documents", [])]
latency_ms = int((time.time() - start_time) * 1000)
# Audit with source attribution
axonflow.audit_llm_call(
context_id=ctx.context_id,
response_summary=result["result"][:200],
provider="openai",
model="gpt-4",
token_usage=TokenUsage(prompt_tokens=100, completion_tokens=50, total_tokens=150),
latency_ms=latency_ms,
metadata={"source_count": len(sources), "sources": sources[:5]}
)
return {"answer": result["result"], "sources": sources, "context_id": ctx.context_id}
Pattern 3: Proxy Mode — Alternative: For simpler deployments
Route all LLM calls through AxonFlow for automatic governance:
from axonflow import AxonFlow
# All governance happens automatically
with AxonFlow.sync(
endpoint="http://localhost:8080",
client_id="langchain-app"
) as axonflow:
result = axonflow.execute_query(
user_token="user-123",
query="What are the key AI governance regulations?",
request_type="chat",
context={"framework": "langchain", "department": "research"}
)
if result.blocked:
print(f"Blocked: {result.block_reason}")
else:
print(result.data) # LLM response
Other Languages: Go, TypeScript, Java — For polyglot or service-oriented architectures
Go SDK Integration
For Go services orchestrating LangChain or hybrid architectures:
package main
import (
"context"
"fmt"
"time"
"github.com/getaxonflow/axonflow-sdk-go"
"github.com/sashabaranov/go-openai"
)
func GovernedLangChainCall(
ctx context.Context,
client *axonflow.AxonFlowClient,
openai *openai.Client,
userToken, query string,
callContext map[string]interface{},
) (string, error) {
startTime := time.Now()
// Pre-check
callContext["framework"] = "langchain"
result, err := client.ExecuteQuery(userToken, query, "chat", callContext)
if err != nil {
return "", err
}
if result.Blocked {
return "", fmt.Errorf("blocked: %s", result.BlockReason)
}
// Make LLM call
resp, err := openai.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: openai.GPT4,
Messages: []openai.ChatCompletionMessage{
{Role: "user", Content: query},
},
})
if err != nil {
return "", err
}
return resp.Choices[0].Message.Content, nil
}
TypeScript SDK Integration
import { AxonFlow } from "@axonflow/sdk";
import { ChatOpenAI } from "@langchain/openai";
async function governedLangChainCall(
userToken: string,
query: string,
context?: Record<string, unknown>
): Promise<string> {
const axonflow = new AxonFlow({
endpoint: "http://localhost:8080",
tenant: "langchain-app",
});
const startTime = Date.now();
// Pre-check
const approval = await axonflow.getPolicyApprovedContext({
userToken,
query,
context: { ...context, framework: "langchain" },
});
if (!approval.approved) {
throw new Error(`Blocked: ${approval.blockReason}`);
}
// LangChain call
const llm = new ChatOpenAI({ model: "gpt-4" });
const response = await llm.invoke(query);
const latencyMs = Date.now() - startTime;
// Audit
await axonflow.auditLLMCall({
contextId: approval.contextId,
responseSummary: response.content.slice(0, 200),
provider: "openai",
model: "gpt-4",
tokenUsage: { promptTokens: 100, completionTokens: 50, totalTokens: 150 },
latencyMs,
});
return response.content;
}
Java SDK Integration
import com.getaxonflow.sdk.AxonFlow;
import com.getaxonflow.sdk.AxonFlowConfig;
import com.getaxonflow.sdk.types.*;
public class GovernedLangChain {
private final AxonFlow axonflow;
public GovernedLangChain() {
this.axonflow = AxonFlow.create(AxonFlowConfig.builder()
.endpoint("http://localhost:8080")
.clientId("langchain-java")
.clientSecret(System.getenv("AXONFLOW_CLIENT_SECRET"))
.build());
}
public String governedCall(String userToken, String query, Map<String, Object> context)
throws Exception {
context.put("framework", "langchain");
long startTime = System.currentTimeMillis();
// Pre-check
PolicyApprovalResult approval = axonflow.getPolicyApprovedContext(
PolicyApprovalRequest.builder()
.userToken(userToken)
.query(query)
.context(context)
.build()
);
if (!approval.isApproved()) {
throw new PolicyViolationException(approval.getBlockReason());
}
// Your LLM call here
String response = callLLM(query);
long latencyMs = System.currentTimeMillis() - startTime;
// Audit
axonflow.auditLLMCall(AuditOptions.builder()
.contextId(approval.getContextId())
.responseSummary(response.substring(0, Math.min(200, response.length())))
.provider("openai")
.model("gpt-4")
.tokenUsage(TokenUsage.of(100, 50))
.latencyMs(latencyMs)
.build());
return response;
}
}
Example Implementations
| Language | SDK | Example |
|---|---|---|
| Python | axonflow | hello-world/python |
| Go | axonflow-sdk-go | hello-world/go |
| TypeScript | @axonflow/sdk | hello-world/typescript |
| Java | axonflow-sdk | hello-world/java |
| HTTP/curl | Raw HTTP | hello-world/http (for PHP, Ruby, Rust, etc.) |
Best Practices
Always use context IDs: The context_id from get_policy_approved_context() must be passed to audit_llm_call() for proper correlation.
Handle blocked requests gracefully: Check ctx.approved before making LLM calls. Return user-friendly messages when blocked.
Always audit, even on errors: Wrap LLM calls in try/except and call audit_llm_call() in both success and error paths.
Use meaningful context: Pass relevant metadata (department, use_case, data_tier) to enable fine-grained policies.
Troubleshooting
| Issue | Solution |
|---|---|
| Pre-check returns 401 | Verify client_secret is correct |
| Audit calls failing | Check context_id is from valid pre-check (not expired) |
| High latency | AxonFlow adds under 10ms; if higher, check network to AxonFlow endpoint |
| Policies not applying | Verify context fields match policy conditions; check AxonFlow logs |