Observability Exporters

AxonFlow's decision tracer emits OpenTelemetry spans for every policy decision. Each span carries structured attributes -- verdict, stage, latency, policy IDs, org, and tenant -- that make governance decisions observable across your existing monitoring infrastructure.

The tracer ships spans to any OTLP/gRPC collector. This page covers how to route those spans to common backends and how to generate Prometheus RED metrics from them.

OTel is opt-in

AxonFlow runs fine without OTel. When AXONFLOW_OTEL_ENDPOINT is unset (the default), the agent uses a no-op tracer and emits nothing. All configurations on this page are additive overlays.

Span Attributes

Every axonflow.decision span carries these attributes:

Attribute	Type	Example
`decision.id`	string	`01J5K...` (ULID)
`decision.stage`	string	`llm`, `tool`, or `agent`
`decision.verdict`	string	`allow`, `deny`, or `needs_approval`
`decision.policy_ids`	string[]	`["p_pii_us", "p_sqli"]`
`decision.latency_ms`	int64	`7`
`decision.reasons`	string	`"clean"` or policy match detail
`org.id`	string	`acme-prod`
`tenant.id`	string	`tenant-rocket`

Environment Variables

Variable	Default	Description
`AXONFLOW_OTEL_ENDPOINT`	(empty)	OTLP/gRPC endpoint (e.g. `otel-collector:4317`). Empty disables tracing.
`AXONFLOW_OTEL_SERVICE_NAME`	`axonflow-agent`	`service.name` resource attribute for dashboard keying.
`AXONFLOW_OTEL_SAMPLE_RATE`	`1.0`	Head sampling ratio `[0.0, 1.0]`. Reduce to `0.1` in high-RPS environments.

Jaeger (Built-in Overlay)

The repository ships a Jaeger overlay out of the box:

docker compose -f docker-compose.yml -f docker-compose.otel.yml up -d
open http://localhost:16686

This uses the OTel Collector to forward spans to Jaeger's OTLP receiver. See the existing OTel tracing setup for trace correlation with WCP workflows.

Datadog APM

Route decision spans to Datadog APM via the OTel Collector's Datadog exporter.

Quick Start

export DD_API_KEY=<your-datadog-api-key>
export DD_SITE=datadoghq.com   # or datadoghq.eu, us3.datadoghq.com, etc.

docker compose -f docker-compose.yml \
  -f docker-compose.otel-datadog.yml up -d

What Gets Exported

The Collector config at config/otel-collector-datadog.yaml maps all decision.*, org.id, and tenant.id span attributes to Datadog APM tags. In Datadog:

APM > Traces: filter by service:axonflow-agent, then facet on decision.verdict, decision.stage, or tenant.id.
Monitors: alert on decision.verdict:deny rate exceeding a threshold.
Dashboards: build widgets using the decision.* tags.

Collector Config

The config uses the attributes/datadog processor to ensure all AxonFlow-specific attributes are forwarded as APM tags. The span_name_as_resource_name: true setting maps the OTel span name (axonflow.decision) to the Datadog resource name, making it filterable in the trace explorer.

Requirements

A Datadog account with APM enabled.
DD_API_KEY set as an environment variable or in a .env file.
DD_SITE set to your Datadog region (defaults to datadoghq.com).

Grafana Tempo + Prometheus

Route traces to Grafana Tempo and generate RED (Rate, Error, Duration) metrics via the OTel Collector's spanmetrics connector.

Quick Start

docker compose -f docker-compose.yml \
  -f docker-compose.otel-grafana.yml up -d

open http://localhost:3000   # Grafana (admin/admin)

This brings up five services:

Service	Port	Purpose
OTel Collector	4317, 4318	Receives OTLP spans, exports to Tempo + generates metrics
Tempo	3200	Trace storage and query
Prometheus	9090	Scrapes spanmetrics from Collector on `:8889`
Grafana	3000	Dashboards for traces + metrics
AxonFlow Agent	8080	Sends spans to Collector

Decision Mode Dashboard

The repository ships a pre-provisioned Grafana dashboard at grafana/dashboards/decision-mode-overview.json with nine panels:

Decision Rate -- decisions per second over time.
Verdict Distribution -- donut chart of allow/deny/needs_approval.
Error Rate -- ratio of STATUS_CODE_ERROR spans to total.
Decision Latency (P50/P95/P99) -- histogram quantiles from duration_milliseconds_bucket.
Decisions by Stage -- breakdown by llm, tool, agent.
Policy Trigger Rate -- stacked bar chart by verdict.
Per-Tenant Decision Volume -- per-tenant.id rate.
Deny Rate by Tenant -- table of deny counts per tenant over the selected range.
Latency Heatmap -- heatmap of duration_milliseconds_bucket.

Template variables $org_id and $tenant_id filter all panels.

Spanmetrics Details

The spanmetrics connector in config/otel-collector-grafana.yaml generates three metric families from axonflow.decision spans:

Metric	Type	Description
`calls_total`	counter	Total span count, labeled by `decision_verdict`, `decision_stage`, `org_id`, `tenant_id`
`duration_milliseconds_bucket`	histogram	Span duration distribution with buckets at 1, 2, 5, 10, 25, 50, 100, 250, 500, 1000 ms
`duration_milliseconds_sum` / `_count`	counter	Sum and count for average latency calculation

These are standard Prometheus metrics, queryable with PromQL and usable in any Prometheus-compatible alerting system.

Using with Existing Prometheus

If you already run Prometheus, add the OTel Collector's metrics endpoint to your scrape config:

scrape_configs:
  - job_name: 'otel-spanmetrics'
    static_configs:
      - targets: ['otel-collector:8889']

LangSmith Trace Correlation

AxonFlow's decision tracer generates a W3C-compliant trace_id that is returned in every API response. If you use LangSmith for LLM observability, you can correlate AxonFlow governance decisions with LangSmith runs by propagating the trace_id as LangSmith run metadata.

How It Works

Your application calls the Decision API (POST /api/v1/decide) before forwarding a request.
AxonFlow returns a trace_id in the response (32-character lowercase hex W3C format).
Your application passes that trace_id as metadata to LangSmith when starting the LLM run.
In LangSmith, search by trace_id to see both the LLM execution and the governance decision side by side.

Get the trace_id

The Decision API response includes trace_id directly:

curl -X POST http://localhost:8080/api/v1/decide \
  -H "Content-Type: application/json" \
  -d '{
    "stage": "llm",
    "query": "What is the user SSN?",
    "target": "gpt-4o",
    "caller_id": "support-agent"
  }'

Response:

{
  "verdict": "deny",
  "decision_id": "01J5K...",
  "trace_id": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6",
  "reasons": ["PII detection: SSN pattern matched"],
  "evaluated_policies": ["p_pii_us"],
  "stage": "llm"
}

Pass trace_id to LangSmith (Python)

import requests
from langsmith import traceable

decision = requests.post(
    "http://localhost:8080/api/v1/decide",
    json={"stage": "llm", "query": query, "target": "gpt-4o", "caller_id": "support-agent"},
).json()

trace_id = decision["trace_id"]

@traceable(metadata={"axonflow_trace_id": trace_id})
def run_llm(query: str):
    pass

if decision["verdict"] == "allow":
    run_llm(query)

Pass trace_id to LangSmith (TypeScript)

const decision = await fetch("http://localhost:8080/api/v1/decide", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    stage: "llm",
    query: "What is the user SSN?",
    target: "gpt-4o",
    caller_id: "support-agent",
  }),
}).then((r) => r.json());

// Pass trace_id to your LangSmith-instrumented function
await tracedLlmCall("What is the user SSN?", {
  metadata: { axonflow_trace_id: decision.trace_id },
});

No code changes are needed on the AxonFlow side. The trace_id is emitted by the decision tracer and included in the Decision API response when AXONFLOW_OTEL_ENDPOINT is configured.

Span Attributes​

Environment Variables​

Jaeger (Built-in Overlay)​

Datadog APM​

Quick Start​

What Gets Exported​

Collector Config​

Requirements​

Grafana Tempo + Prometheus​

Quick Start​

Decision Mode Dashboard​

Spanmetrics Details​

Using with Existing Prometheus​

LangSmith Trace Correlation​

How It Works​

Get the trace_id​

Pass trace_id to LangSmith (Python)​

Pass trace_id to LangSmith (TypeScript)​

Span Attributes

Environment Variables

Jaeger (Built-in Overlay)

Datadog APM

Quick Start

What Gets Exported

Collector Config

Requirements

Grafana Tempo + Prometheus

Quick Start

Decision Mode Dashboard

Spanmetrics Details

Using with Existing Prometheus

LangSmith Trace Correlation

How It Works

Get the trace_id

Pass trace_id to LangSmith (Python)

Pass trace_id to LangSmith (TypeScript)