Orchestrator API Endpoints
Complete API reference for AxonFlow Orchestrator - Multi-Agent Parallel execution endpoints.
Overview
The Orchestrator coordinates multi-agent parallel (MAP) execution, allowing multiple agents to process queries simultaneously for significant speedup (up to Nx where N = parallel agents).
Base URL: https://YOUR_ORCHESTRATOR_ENDPOINT
Port: 8081 (default, configurable via PORT env var)
Protocol: HTTP/2 + TLS 1.3
Authentication: OAuth2-style Basic auth (Authorization: Basic base64(clientId:clientSecret))
Authentication
All requests require OAuth2-style Basic authentication:
POST /api/v1/process
Host: YOUR_ORCHESTRATOR_ENDPOINT
Content-Type: application/json
Authorization: Basic base64(clientId:clientSecret)
Headers:
Authorization(required): Basic auth with clientId:clientSecretX-Organization-ID(optional): Your organization identifier (can also use clientId)Content-Type(required):application/jsonX-Request-ID(optional): Client-provided request ID for tracing
Core Processing Endpoints
POST /api/v1/process
Main request processing endpoint. Evaluates dynamic policies, routes to LLM providers, and returns responses.
Request:
curl -X POST http://localhost:8081/api/v1/process \
-H "Content-Type: application/json" \
-d '{
"request_id": "req_123",
"query": "What is the weather in Paris?",
"request_type": "chat",
"user": {
"id": 123,
"email": "[email protected]",
"role": "analyst",
"tenant_id": "my-tenant"
},
"client": {
"id": "client_456",
"name": "my-app",
"org_id": "org_789",
"tenant_id": "my-tenant"
},
"context": {
"department": "support"
}
}'
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
request_id | string | No | Client-provided request ID (auto-generated if empty) |
query | string | Yes | User query to process |
request_type | string | No | Request type for routing |
user | object | Yes | User context for policy evaluation |
client | object | Yes | Client/application context |
context | object | No | Additional context data |
context supports request-level routing fields:
provider(string): preferred provider (fallback allowed)strict_provider(boolean, optional): hard pin provider for this request (disables fallback)
Response (200 OK):
{
"request_id": "req_123",
"success": true,
"data": "The weather in Paris is currently 15°C with partly cloudy skies.",
"redacted": false,
"redacted_fields": [],
"policy_info": {
"allowed": true,
"applied_policies": ["governance.rate_limit", "compliance.pii_check"],
"risk_score": 0.05,
"required_actions": [],
"processing_time_ms": 2
},
"provider_info": {
"provider": "openai",
"model": "gpt-4o",
"response_time_ms": 856,
"tokens_used": 43,
"cost": 0.002
},
"processing_time": "860ms"
}
POST /api/v1/plan
Trigger multi-agent planning (MAP) to decompose complex tasks into executable steps.
Request:
curl -X POST http://localhost:8081/api/v1/plan \
-H "Content-Type: application/json" \
-d '{
"query": "Book a flight from SFO to Paris and find a hotel near the Eiffel Tower",
"domain": "travel",
"execution_mode": "auto",
"user": {
"id": 123,
"email": "[email protected]",
"role": "user",
"tenant_id": "my-tenant"
},
"context": {
"budget": 5000,
"currency": "USD"
}
}'
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
query | string | Yes | Complex task to decompose |
domain | string | No | Domain hint: travel, healthcare, finance, generic |
execution_mode | string | No | auto, parallel, sequential (default: auto) |
user | object | Yes | User context (must include valid ID) |
context | object | No | Additional context for planning |
Response (200 OK):
{
"success": true,
"plan_id": "plan_abc123",
"steps": [
{
"id": "step_1",
"name": "search_flights",
"type": "mcp_query",
"description": "Search for flights from SFO to CDG",
"agent": "travel-agent",
"parameters": {
"connector": "amadeus-travel",
"origin": "SFO",
"destination": "CDG"
}
},
{
"id": "step_2",
"name": "search_hotels",
"type": "mcp_query",
"description": "Search for hotels near Eiffel Tower",
"agent": "travel-agent",
"depends_on": [],
"parameters": {
"connector": "amadeus-travel",
"location": "Paris, Eiffel Tower"
}
},
{
"id": "step_3",
"name": "summarize_options",
"type": "llm",
"description": "Summarize flight and hotel options",
"depends_on": ["step_1", "step_2"]
}
],
"workflow_execution_id": "wf_xyz789",
"execution_mode": "parallel",
"estimated_time_ms": 5000
}
POST /api/v1/plan/{id}/cancel
Cancel a pending or executing plan.
Request:
curl -X POST http://localhost:8081/api/v1/plan/plan_abc123/cancel \
-H "Content-Type: application/json" \
-d '{"reason": "User requested cancellation"}'
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
reason | string | No | Optional cancellation reason |
Response (200 OK):
{
"success": true,
"plan_id": "plan_abc123",
"status": "cancelled"
}
Error Responses:
| Status | Description |
|---|---|
| 404 | Plan not found |
| 409 | Plan already completed or cancelled |
GET /api/v1/plan/{id}/versions
Retrieve the version history for a plan. Each entry records who changed it, when, and why.
Request:
curl http://localhost:8081/api/v1/plan/plan_abc123/versions
Response (200 OK):
{
"plan_id": "plan_abc123",
"versions": [
{
"version": 1,
"changed_at": "2026-01-15T10:00:00Z",
"change_type": "created",
"change_summary": "Plan created"
},
{
"version": 2,
"changed_at": "2026-01-15T10:05:00Z",
"changed_by": "user-123",
"change_type": "updated",
"change_summary": "Changed execution_mode to parallel"
}
]
}
Limits: Community allows max 10 versions per plan and 25 plans. Evaluation allows 25 versions per plan and 100 plans. Enterprise is unlimited.
POST /api/v1/plan/{id}/rollback/{version} Enterprise
Rollback a plan to a previously saved version. Creates a new version entry that restores the target version state. Returns 409 on concurrent modification.
Request:
curl -X POST http://localhost:8081/api/v1/plan/plan_abc123/rollback/2
Response (200 OK):
{
"plan_id": "plan_abc123",
"version": 4,
"previous_version": 2,
"status": "pending"
}
Error Responses:
| Status | Description |
|---|---|
| 403 | Enterprise license required |
| 404 | Plan or version not found |
| 409 | Concurrent modification (optimistic locking) |
POST /api/v1/plan/{id}/resume Enterprise
Resume a plan paused at an approval gate. Used with confirm and step execution modes.
- confirm mode: Every step requires explicit approval before execution
- step mode: First step auto-executes, subsequent steps require approval
Request:
curl -X POST http://localhost:8081/api/v1/plan/plan_abc123/resume \
-H "Content-Type: application/json" \
-d '{"approved": true}'
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
approved | boolean | No | Whether to approve the pending step (default: true) |
Response (200 OK) — awaiting next step:
{
"plan_id": "plan_abc123",
"status": "awaiting_approval",
"result": null
}
Response (200 OK) — all steps complete:
{
"plan_id": "plan_abc123",
"status": "completed",
"result": {
"summary": "Trip booked successfully"
}
}
Error Responses:
| Status | Description |
|---|---|
| 403 | Enterprise license required |
| 404 | Plan not found or not paused |
Webhook Endpoints
POST /api/v1/webhooks
Create a webhook subscription for real-time event notifications. Events are delivered as HTTP POST requests with HMAC-SHA256 signatures.
Request:
curl -X POST http://localhost:8081/api/v1/webhooks \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/webhooks/axonflow",
"events": ["policy.violation", "workflow.completed", "budget.exceeded"],
"secret": "whsec_abc123secretkey",
"active": true
}'
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Webhook delivery URL |
events | string[] | Yes | Event types to subscribe to |
secret | string | No | Secret for HMAC-SHA256 signature verification |
active | boolean | No | Whether the webhook is active (default: true) |
Response (201 Created):
{
"id": "wh_abc123",
"url": "https://example.com/webhooks/axonflow",
"events": ["policy.violation", "workflow.completed", "budget.exceeded"],
"active": true,
"created_at": "2026-02-07T10:00:00Z"
}
GET /api/v1/webhooks
List all webhook subscriptions.
Response (200 OK):
{
"webhooks": [
{
"id": "wh_abc123",
"url": "https://example.com/webhooks/axonflow",
"events": ["policy.violation", "workflow.completed"],
"active": true,
"created_at": "2026-02-07T10:00:00Z"
}
],
"total": 1
}
GET /api/v1/webhooks/{id}
Get a specific webhook subscription by ID.
PUT /api/v1/webhooks/{id}
Update a webhook subscription (URL, events, active status).
DELETE /api/v1/webhooks/{id}
Delete a webhook subscription.
GET /api/v1/providers/status
Get health and routing status for all configured LLM providers.
Request:
curl http://localhost:8081/api/v1/providers/status
Response (200 OK):
{
"providers": {
"openai": {
"status": "healthy",
"message": "",
"last_checked": "2025-01-02T10:00:00Z",
"latency_ms": 234
},
"anthropic": {
"status": "healthy",
"message": "",
"last_checked": "2025-01-02T10:00:00Z",
"latency_ms": 312
}
}
}
PUT /api/v1/providers/weights
Update routing weights for LLM providers dynamically.
Request:
curl -X PUT http://localhost:8081/api/v1/providers/weights \
-H "Content-Type: application/json" \
-d '{
"weights": {
"openai": 70,
"anthropic": 30
}
}'
Request Body:
| Field | Type | Required | Description |
|---|---|---|---|
weights | object | Yes | Map of provider names to weights (0-100) |
Response (200 OK):
{
"weights": {
"openai": 70,
"anthropic": 30
}
}
Response (400 Bad Request):
{
"error": {
"code": "VALIDATION_ERROR",
"message": "provider not found: invalid-provider"
}
}
Health & Metrics Endpoints
GET /health
Check orchestrator service health.
Request:
curl http://localhost:8081/health
Response (200 OK):
{
"status": "healthy",
"service": "axonflow-orchestrator",
"version": "1.0.0",
"timestamp": "2025-01-02T10:00:00Z",
"components": {
"policy_engine": true,
"llm_router": true,
"response_processor": true,
"audit_logger": true,
"workflow_engine": true,
"planning_engine": true,
"result_aggregator": true
}
}
GET /api/v1/metrics
Get detailed orchestrator performance metrics in JSON format.
Request:
curl http://localhost:8081/api/v1/metrics
Response (200 OK):
{
"orchestrator_metrics": {
"uptime_seconds": 3600.0,
"total_requests": 1000,
"success_requests": 950,
"failed_requests": 10,
"blocked_requests": 40,
"success_rate": 95.0,
"rps": 0.278,
"error_rate_per_sec": 0.005,
"dynamic_policy_eval_p50_ms": 30.0,
"dynamic_policy_eval_p95_ms": 100.0,
"dynamic_policy_eval_p99_ms": 150.0,
"dynamic_policy_eval_avg_ms": 50.0,
"llm_routing_p50_ms": 200.0,
"llm_routing_p95_ms": 400.0,
"llm_routing_p99_ms": 500.0,
"llm_routing_avg_ms": 250.0
},
"health": {
"up": 1,
"consecutive_errors": 0
},
"request_types": {},
"providers": {},
"timestamp": "2025-01-02T10:00:00Z"
}
GET /prometheus
Get metrics in Prometheus exposition format.
Request:
curl http://localhost:8081/prometheus
Returns metrics in the standard Prometheus text format for scraping by Prometheus or compatible monitoring systems.
Request/Response Schemas
OrchestratorRequest Schema
interface OrchestratorRequest {
request_id?: string; // Client-provided request ID (auto-generated if empty)
query: string; // Natural language query
request_type?: string; // Request type for routing
skip_llm?: boolean; // Skip LLM calls (for testing)
user: UserContext; // User context for policy evaluation
client: ClientContext; // Client/application context
context?: Record<string, any>; // Additional context data
}
interface UserContext {
id: number;
email: string;
role: string;
tenant_id?: string;
}
interface ClientContext {
id: string;
name: string;
org_id?: string;
tenant_id?: string;
}
OrchestratorResponse Schema
interface OrchestratorResponse {
request_id: string;
success: boolean;
data?: any; // LLM response data
error?: string; // Error message (omitted on success)
redacted: boolean; // Whether response was redacted
redacted_fields?: string[]; // Fields that were redacted
policy_info: PolicyEvaluationResult;
provider_info: ProviderInfo;
processing_time: string; // Duration string (e.g., "860ms")
}
interface PolicyEvaluationResult {
allowed: boolean;
applied_policies: string[];
risk_score: number;
required_actions: string[];
processing_time_ms: number;
database_accessed?: boolean;
preferred_provider?: string; // Provider routing from policy
allowed_providers?: string[]; // Strict provider list from policy
routing_reason?: string; // Why routing was changed
}
interface ProviderInfo {
provider: string;
model: string;
response_time_ms: number;
tokens_used?: number;
cost?: number;
}
Error Codes
Orchestrator-Specific Errors
| Code | HTTP Status | Description | Retry? |
|---|---|---|---|
ORCHESTRATOR_UNAVAILABLE | 503 | All agents unhealthy | Yes |
INSUFFICIENT_AGENTS | 503 | Not enough agents for parallel execution | Yes |
PARALLEL_EXECUTION_FAILED | 500 | Parallel execution error | Yes |
QUERY_TIMEOUT | 504 | Query exceeded timeout | Maybe |
AGENT_COMMUNICATION_ERROR | 502 | Cannot reach agent | Yes |
Example Error Response
{
"error": {
"code": "INSUFFICIENT_AGENTS",
"message": "Not enough healthy agents for parallel execution",
"details": {
"required_agents": 5,
"available_agents": 2,
"healthy_agents": ["agent-1", "agent-3"],
"unhealthy_agents": ["agent-2", "agent-4", "agent-5"]
},
"request_id": "orch_err456",
"timestamp": "2025-11-11T12:00:00Z"
}
}
Rate Limiting
Orchestrator applies the same rate limits as Agent endpoints:
Limits:
- 100 requests/minute per organization
- 10,000 requests/hour per organization
- Parallel queries count as 1 request
Headers:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1699711200
Rate Limit Exceeded (429):
{
"error": {
"code": "RATE_LIMIT_EXCEEDED",
"message": "Organization rate limit exceeded",
"details": {
"limit": 100,
"period": "1 minute",
"reset_at": "2025-11-11T12:01:00Z"
},
"request_id": "orch_err789"
}
}
Best Practices
1. Optimal Parallel Execution
// ✅ Good: 5 queries in parallel (matches 5 agents)
const response = await client.executeParallel([
{ query: 'Query 1', policy },
{ query: 'Query 2', policy },
{ query: 'Query 3', policy },
{ query: 'Query 4', policy },
{ query: 'Query 5', policy }
]);
// ❌ Suboptimal: 10 queries (only 5 agents available)
// Some queries will wait for agents to become available
const response = await client.executeParallel([
...tenQueries // Will execute in batches of 5
]);
Recommendation: Match parallel query count to available agents for best performance.
2. Timeout Configuration
// Short queries: 10 seconds
const response = await client.executeParallel(queries, {
timeout_ms: 10000
});
// LLM queries: 30 seconds
const response = await client.executeParallel(llmQueries, {
timeout_ms: 30000
});
// Complex data queries: 60 seconds
const response = await client.executeParallel(complexQueries, {
timeout_ms: 60000
});
3. Error Handling
const response = await client.executeParallel(queries);
// Check each result
response.results.forEach((result, index) => {
if (result.status === 'error') {
console.error(`Query ${index} failed:`, result.error);
// Handle error for this specific query
} else {
console.log(`Query ${index} succeeded:`, result.result);
}
});
// Check overall success rate
const successRate = response.execution_summary.successful /
response.execution_summary.total_queries;
if (successRate < 0.8) {
console.warn('Success rate below 80% - investigate issues');
}
4. Agent Utilization Monitoring
// Get metrics after execution
const metrics = await client.getOrchestratorMetrics();
// Check agent balance
const utilization = metrics.agent_utilization;
const avgQueries = Object.values(utilization)
.reduce((sum, count) => sum + count, 0) / Object.keys(utilization).length;
Object.entries(utilization).forEach(([agent, queries]) => {
if (queries < avgQueries * 0.8) {
console.warn(`${agent} underutilized: ${queries} queries (avg: ${avgQueries})`);
}
});
Performance Optimization
Parallel Speedup Formula
Speedup = Sequential Time / Parallel Time
Example:
- 5 queries × 5 seconds each = 25 seconds (sequential)
- max(5s, 5s, 5s, 5s, 5s) = 5 seconds (parallel)
- Speedup = 25s / 5s = 5x
Theoretical vs Actual Speedup
| Queries | Agents | Theoretical | Actual | Efficiency |
|---|---|---|---|---|
| 2 | 5 | 2.0x | 1.9x | 95% |
| 5 | 5 | 5.0x | 4.5x | 90% |
| 10 | 5 | 2.0x | 1.8x | 90% |
| 20 | 5 | 4.0x | 3.5x | 88% |
Note: Overhead from orchestration reduces actual speedup by ~10%.
SDK Support
TypeScript
import { AxonFlow } from '@axonflow/sdk';
const client = new AxonFlow({
endpoint: process.env.AXONFLOW_ENDPOINT,
clientId: process.env.AXONFLOW_CLIENT_ID,
clientSecret: process.env.AXONFLOW_CLIENT_SECRET,
});
// Parallel execution
const response = await client.executeParallel([
{ query: 'Query 1', policy: policyContent },
{ query: 'Query 2', policy: policyContent }
]);
Go
import "github.com/getaxonflow/axonflow-sdk-go/v3"
client, _ := axonflow.NewClient(axonflow.AxonFlowConfig{
Endpoint: os.Getenv("AXONFLOW_ENDPOINT"),
ClientID: os.Getenv("AXONFLOW_CLIENT_ID"),
ClientSecret: os.Getenv("AXONFLOW_CLIENT_SECRET"),
})
// Parallel execution
responses, err := client.ExecuteParallel(ctx, []*axonflow.QueryRequest{
{Query: "Query 1", Policy: policyContent},
{Query: "Query 2", Policy: policyContent},
})
Monitoring
CloudWatch Metrics
Available Metrics:
orchestrator.requests.total- Total requestsorchestrator.requests.parallel- Parallel execution requestsorchestrator.latency.p95- 95th percentile latencyorchestrator.errors.rate- Error rateorchestrator.agents.available- Available agent countorchestrator.speedup.average- Average parallel speedup
CloudWatch Logs
Log Group: /ecs/YOUR-STACK-NAME/orchestrator
Example Log Entry:
{
"timestamp": "2025-11-11T12:00:00Z",
"level": "info",
"message": "Parallel execution completed",
"queries": 5,
"successful": 5,
"failed": 0,
"total_time_ms": 5234,
"speedup": "4.5x",
"agent_utilization": {
"agent-1": 1,
"agent-2": 1,
"agent-3": 1,
"agent-4": 1,
"agent-5": 1
},
"request_id": "orch_abc123"
}
Additional Resources
- Agent Endpoints - Agent API reference
- Error Codes - Complete error reference
- Workflow Examples - MAP execution examples
- SDK Documentation - SDK usage
Support
Questions about Orchestrator API?
- Email: [email protected]
- Documentation: https://docs.getaxonflow.com
- API Status: https://status.getaxonflow.com
API Version: 1.0.0
