AutoGen + AxonFlow Integration

Prerequisites: Python 3.9+, AxonFlow running (Getting Started), pip install pyautogen axonflow

What AutoGen Does Well

AutoGen is Microsoft's premier framework for building multi-agent AI systems—with strong community adoption and continuous development. Its strengths are real:

Multi-Agent Orchestration: Build teams of autonomous agents that converse, collaborate, and delegate tasks. GroupChat, UserProxyAgent, and AssistantAgent patterns handle complex coordination.

Flexible Agent Design: Define agents with custom behaviors, tool use, and memory. Agents can execute code, call APIs, and interact with humans in the loop.

Code Execution: Built-in sandboxed code execution with Docker support. Agents can write, run, and iterate on code safely.

Human-in-the-Loop: Native patterns for human approval, feedback, and intervention. human_input_mode controls when humans are consulted.

Active Development: Regular releases, strong Microsoft backing, and active community. Issues get addressed, patterns evolve.

What AutoGen Doesn't Try to Solve

AutoGen focuses on agent orchestration and collaboration. These concerns are explicitly out of scope:

Production Requirement	AutoGen's Position
Policy enforcement before agent actions	Not provided—no built-in way to block requests based on content or context
PII detection in agent communications	Not addressed—agents can share sensitive data freely between themselves
SQL injection prevention	Not provided—code execution is sandboxed but input validation is external
Per-agent or per-user cost attribution	Not tracked—no way to attribute API costs to specific agents or users
Audit trails for compliance	Requires external logging—agent conversations aren't logged by default
Cross-agent access control	Not addressed—any agent can message any other agent in a group
Token budget enforcement	Not provided—agents can consume unlimited tokens without limits

This isn't a criticism—it's a design choice. AutoGen handles orchestration. Governance is a separate concern.

Where Teams Hit Production Friction

Based on real enterprise deployments, here are the blockers that appear after the prototype works:

1. The Recursive Agent Loop

A researcher agent asks an analyst agent for data. The analyst requests clarification. The researcher rephrases. This continues. Over the weekend, 23,000 API calls are made before the conversation timeout.

AutoGen processed every message as intended. Nothing was watching the cost of what it processed.

2. The "What Did The Agents Discuss?" Question

Compliance asks for an audit trail of a multi-agent decision. The agents reached a conclusion, but:

What prompts were sent between agents?
What data was shared in the conversation?
Which agent made which decision?
What was the chain of reasoning?

This information wasn't captured. AutoGen facilitated the conversation; it didn't log it for compliance.

3. The PII Exposure in Agent Memory

An HR agent has access to employee data. It shares context with a reporting agent. The reporting agent, designed to produce summaries for managers, now has employee SSNs in its conversation history.

AutoGen has no built-in mechanism to filter PII between agents.

4. The Security Review Block

Security review: BLOCKED
- No audit trail for inter-agent communications
- PII can flow between agents without filtering
- Code execution governed only by Docker isolation
- No role-based access control for agents
- Cost controls missing

The multi-agent system worked perfectly in demo. It can't ship.

5. The Runaway Code Execution

A coding agent generates and executes code in a loop. Each iteration calls an API. The code works, but the API has per-call costs. 50,000 iterations later, the bill arrives.

AutoGen's sandbox prevented security issues. It didn't prevent financial issues.

How AxonFlow Plugs In

AxonFlow doesn't replace AutoGen. It sits underneath it—providing the governance layer that AutoGen intentionally doesn't include:

┌─────────────────┐
│   Your App      │
└────────┬────────┘
         │
         v
┌─────────────────┐
│     AutoGen     │  <-- Agents, GroupChat, Code Execution
└────────┬────────┘
         │
         v
┌─────────────────────────────────┐
│         AxonFlow                │
│  ┌───────────┐  ┌────────────┐  │
│  │  Policy   │  │   Audit    │  │
│  │  Enforce  │  │   Trail    │  │
│  └───────────┘  └────────────┘  │
│  ┌───────────┐  ┌────────────┐  │
│  │    PII    │  │    Cost    │  │
│  │  Detection│  │   Control  │  │
│  └───────────┘  └────────────┘  │
└────────────────┬────────────────┘
                 │
                 v
┌─────────────────┐
│   LLM Provider  │
└─────────────────┘

What this gives you:

Every agent action logged with agent identity and context
PII detected and blocked before flowing between agents
SQL injection attempts blocked even in code generation
Cost tracked per agent, per user, per conversation
Compliance auditors can query the full decision chain

What stays the same:

Your AutoGen code doesn't change
Agent orchestration patterns work as before
No new abstractions to learn

Integration Patterns

Pattern 1: Governed Agent Wrapper (Gateway Mode) — Recommended

Recommended default for most teams. Wrap AutoGen agents with AxonFlow governance:

import os
import time
from typing import Optional, Dict, Any, List, Union
from autogen import AssistantAgent, UserProxyAgent, ConversableAgent
from axonflow import AxonFlow, TokenUsage

class GovernedAutoGenAgent:
    """Wrapper that adds AxonFlow governance to AutoGen agents."""

    def __init__(
        self,
        axonflow_client: AxonFlow,
        agent: ConversableAgent,
        user_token: str,
        agent_role: str = "assistant"
    ):
        self.axonflow = axonflow_client
        self.agent = agent
        self.user_token = user_token
        self.agent_role = agent_role

        # Wrap the agent's LLM call
        self._original_generate = agent.generate_reply
        agent.generate_reply = self._governed_generate

    def _governed_generate(
        self,
        messages: Optional[List[Dict]] = None,
        sender: Optional[ConversableAgent] = None,
        **kwargs
    ) -> Union[str, Dict, None]:
        """Generate reply with AxonFlow governance."""
        query = ""
        if messages:
            last_message = messages[-1]
            if isinstance(last_message, dict):
                query = last_message.get("content", "")
            else:
                query = str(last_message)

        start_time = time.time()
        ctx = self.axonflow.get_policy_approved_context(
            user_token=self.user_token,
            query=query,
            context={
                "agent_role": self.agent_role,
                "agent_name": self.agent.name,
                "framework": "autogen"
            }
        )

        if not ctx.approved:
            return f"[BLOCKED by policy: {ctx.block_reason}]"

        llm_start = time.time()
        response = self._original_generate(messages, sender, **kwargs)
        llm_end = time.time()

        response_text = response if isinstance(response, str) else str(response)
        self.axonflow.audit_llm_call(
            context_id=ctx.context_id,
            response_summary=response_text[:200],
            provider="openai",
            model=self.agent.llm_config.get("model", "gpt-4"),
            latency_ms=int((llm_end - llm_start) * 1000)
        )

        return response


# Usage
with AxonFlow.sync(
    endpoint=os.getenv("AXONFLOW_ENDPOINT", "http://localhost:8080"),
    client_id=os.getenv("AXONFLOW_CLIENT_ID"),
    client_secret=os.getenv("AXONFLOW_CLIENT_SECRET")
) as axonflow:

    assistant = AssistantAgent(
        name="research_assistant",
        llm_config={"model": "gpt-4", "api_key": os.getenv("OPENAI_API_KEY")}
    )

    governed = GovernedAutoGenAgent(
        axonflow_client=axonflow,
        agent=assistant,
        user_token="user-123",
        agent_role="research_assistant"
    )

    # Agent is now governed

Pattern 2: GroupChat with Per-Agent Policies — Advanced: For multi-agent systems

Apply different policies to different agent roles:

class GovernedGroupChat:
    """GroupChat with AxonFlow governance for each agent."""

    def __init__(self, axonflow: AxonFlow, user_token: str):
        self.axonflow = axonflow
        self.user_token = user_token
        self.agent_policies = {}

    def add_agent(self, agent: AssistantAgent, policy_context: dict):
        """Register an agent with its policy context."""
        self.agent_policies[agent.name] = policy_context

        original_generate = agent.generate_reply

        def governed_generate(messages=None, sender=None, **kwargs):
            query = messages[-1].get("content", "") if messages else ""

            ctx_data = self.agent_policies.get(agent.name, {})
            ctx_data["agent_name"] = agent.name
            ctx_data["framework"] = "autogen"

            ctx = self.axonflow.get_policy_approved_context(
                user_token=self.user_token,
                query=query,
                context=ctx_data
            )

            if not ctx.approved:
                return f"[BLOCKED: {ctx.block_reason}]"

            response = original_generate(messages, sender, **kwargs)

            self.axonflow.audit_llm_call(
                context_id=ctx.context_id,
                response_summary=str(response)[:200],
                provider="openai",
                model=agent.llm_config.get("model", "gpt-4"),
                latency_ms=100
            )

            return response

        agent.generate_reply = governed_generate
        return agent


# Different policies per agent role
governed = GovernedGroupChat(axonflow, user_token="analyst-team")

researcher = governed.add_agent(
    AssistantAgent(name="researcher", llm_config={"model": "gpt-4"}),
    policy_context={"role": "researcher", "data_access": ["external", "public"]}
)

analyst = governed.add_agent(
    AssistantAgent(name="analyst", llm_config={"model": "gpt-4"}),
    policy_context={"role": "analyst", "data_access": ["internal", "financial"]}
)

Pattern 3: Java Service Orchestrating AutoGen — For service-oriented architectures

For Java services coordinating with AutoGen via REST:

package com.example.autogen;

import com.getaxonflow.sdk.AxonFlow;
import com.getaxonflow.sdk.AxonFlowConfig;
import com.getaxonflow.sdk.AuditOptions;
import com.getaxonflow.sdk.PolicyApprovalRequest;
import com.getaxonflow.sdk.PolicyApprovalResult;
import com.getaxonflow.sdk.TokenUsage;

import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.net.URI;
import java.util.Map;

public class GovernedAutoGenService {

    private final AxonFlow axonflow;
    private final HttpClient httpClient;
    private final String autogenServiceUrl;

    public GovernedAutoGenService(String autogenServiceUrl) {
        this.autogenServiceUrl = autogenServiceUrl;
        this.httpClient = HttpClient.newHttpClient();

        AxonFlowConfig config = AxonFlowConfig.builder()
            .endpoint(System.getenv("AXONFLOW_ENDPOINT"))
            .clientId(System.getenv("AXONFLOW_CLIENT_ID"))
            .clientSecret(System.getenv("AXONFLOW_CLIENT_SECRET"))
            .build();

        this.axonflow = AxonFlow.create(config);
    }

    public String executeGovernedConversation(
        String userToken,
        String message,
        String agentTeam,
        Map<String, Object> context
    ) throws Exception {
        context.put("framework", "autogen");
        context.put("agent_team", agentTeam);

        PolicyApprovalResult approval = axonflow.getPolicyApprovedContext(
            PolicyApprovalRequest.builder()
                .userToken(userToken)
                .query(message)
                .context(context)
                .build()
        );

        if (!approval.isApproved()) {
            throw new PolicyViolationException(approval.getBlockReason());
        }

        long startTime = System.currentTimeMillis();
        HttpResponse<String> response = httpClient.send(
            HttpRequest.newBuilder()
                .uri(URI.create(autogenServiceUrl + "/chat"))
                .header("Content-Type", "application/json")
                .POST(HttpRequest.BodyPublishers.ofString(
                    "{\"message\": \"" + message + "\", \"agent_team\": \"" + agentTeam + "\"}"
                ))
                .build(),
            HttpResponse.BodyHandlers.ofString()
        );

        axonflow.auditLLMCall(AuditOptions.builder()
            .contextId(approval.getContextId())
            .responseSummary(response.body().substring(0, Math.min(200, response.body().length())))
            .provider("openai")
            .model("gpt-4")
            .tokenUsage(TokenUsage.of(0, 0, 0))
            .latencyMs(System.currentTimeMillis() - startTime)
            .build());

        return response.body();
    }
}

Conversation History Redaction

In multi-agent AutoGen systems, PII can accumulate in conversation history as agents exchange messages. AxonFlow detects PII across the full conversation context, not just individual messages.

How It Works

Each time an agent generates a reply, the GovernedAutoGenAgent wrapper sends the latest message to AxonFlow for policy evaluation. AxonFlow's PII detection scans the content and can:

Block the message entirely if it contains prohibited PII types
Redact specific PII patterns (SSN, credit card, email) while allowing the message through
Log PII detections for audit purposes without blocking

Redaction Across Multi-Turn Conversations

# Configure PII redaction policy
# POST http://localhost:8081/api/v1/policies
# {
#   "name": "autogen-pii-redaction",
#   "pattern": "\\b\\d{3}-\\d{2}-\\d{4}\\b",  # SSN pattern
#   "action": "redact",
#   "context_filter": {"framework": "autogen"}
# }

# When an agent receives a message containing an SSN like "123-45-6789",
# AxonFlow redacts it to "***-**-****" before the agent processes it.
# The audit trail records:
# - Original message hash (not the content)
# - PII types detected: ["ssn"]
# - Redaction applied: true
# - Agent name and role

Protecting Agent Memory

When agents accumulate context over many turns, earlier messages containing PII remain in the conversation history. AxonFlow checks each new message as it enters the conversation, preventing PII from propagating further:

# Agent A sends: "Customer John Doe (SSN: 123-45-6789) needs help"
# AxonFlow intercepts, redacts SSN
# Agent B receives: "Customer John Doe (SSN: ***-**-****) needs help"
# The redacted version is what enters Agent B's conversation history

This ensures that even in long-running multi-agent conversations, sensitive data does not accumulate unchecked in agent memory.

Example Implementations

Language	SDK	Example
Python	axonflow	autogen/python
Java	axonflow-sdk	autogen/java

What AutoGen Does Well​

What AutoGen Doesn't Try to Solve​

Where Teams Hit Production Friction​

1. The Recursive Agent Loop​

2. The "What Did The Agents Discuss?" Question​

3. The PII Exposure in Agent Memory​

4. The Security Review Block​

5. The Runaway Code Execution​

How AxonFlow Plugs In​

Integration Patterns​

Pattern 1: Governed Agent Wrapper (Gateway Mode) — Recommended​

Conversation History Redaction​

How It Works​

Redaction Across Multi-Turn Conversations​

Protecting Agent Memory​

Example Implementations​

Related Resources​