Observability Exporters
AxonFlow's decision tracer emits OpenTelemetry spans for every policy decision. Each span carries structured attributes -- verdict, stage, latency, policy IDs, org, and tenant -- that make governance decisions observable across your existing monitoring infrastructure.
The tracer ships spans to any OTLP/gRPC collector. This page covers how to route those spans to common backends and how to generate Prometheus RED metrics from them.
AxonFlow runs fine without OTel. When AXONFLOW_OTEL_ENDPOINT is unset (the default), the agent uses a no-op tracer and emits nothing. All configurations on this page are additive overlays.
Span Attributes
Every axonflow.decision span carries these attributes:
| Attribute | Type | Example |
|---|---|---|
decision.id | string | 01J5K... (ULID) |
decision.stage | string | llm, tool, or agent |
decision.verdict | string | allow, deny, or needs_approval |
decision.policy_ids | string[] | ["p_pii_us", "p_sqli"] |
decision.latency_ms | int64 | 7 |
decision.reasons | string | "clean" or policy match detail |
org.id | string | acme-prod |
tenant.id | string | tenant-rocket |
Environment Variables
| Variable | Default | Description |
|---|---|---|
AXONFLOW_OTEL_ENDPOINT | (empty) | OTLP/gRPC endpoint (e.g. otel-collector:4317). Empty disables tracing. |
AXONFLOW_OTEL_SERVICE_NAME | axonflow-agent | service.name resource attribute for dashboard keying. |
AXONFLOW_OTEL_SAMPLE_RATE | 1.0 | Head sampling ratio [0.0, 1.0]. Reduce to 0.1 in high-RPS environments. |
Jaeger (Built-in Overlay)
The repository ships a Jaeger overlay out of the box:
docker compose -f docker-compose.yml -f docker-compose.otel.yml up -d
open http://localhost:16686
This uses the OTel Collector to forward spans to Jaeger's OTLP receiver. See the existing OTel tracing setup for trace correlation with WCP workflows.
Datadog APM
Route decision spans to Datadog APM via the OTel Collector's Datadog exporter.
Quick Start
export DD_API_KEY=<your-datadog-api-key>
export DD_SITE=datadoghq.com # or datadoghq.eu, us3.datadoghq.com, etc.
docker compose -f docker-compose.yml \
-f docker-compose.otel-datadog.yml up -d
What Gets Exported
The Collector config at config/otel-collector-datadog.yaml maps all decision.*, org.id, and tenant.id span attributes to Datadog APM tags. In Datadog:
- APM > Traces: filter by
service:axonflow-agent, then facet ondecision.verdict,decision.stage, ortenant.id. - Monitors: alert on
decision.verdict:denyrate exceeding a threshold. - Dashboards: build widgets using the
decision.*tags.
Collector Config
The config uses the attributes/datadog processor to ensure all AxonFlow-specific attributes are forwarded as APM tags. The span_name_as_resource_name: true setting maps the OTel span name (axonflow.decision) to the Datadog resource name, making it filterable in the trace explorer.
Requirements
- A Datadog account with APM enabled.
DD_API_KEYset as an environment variable or in a.envfile.DD_SITEset to your Datadog region (defaults todatadoghq.com).
Grafana Tempo + Prometheus
Route traces to Grafana Tempo and generate RED (Rate, Error, Duration) metrics via the OTel Collector's spanmetrics connector.
Quick Start
docker compose -f docker-compose.yml \
-f docker-compose.otel-grafana.yml up -d
open http://localhost:3000 # Grafana (admin/admin)
This brings up five services:
| Service | Port | Purpose |
|---|---|---|
| OTel Collector | 4317, 4318 | Receives OTLP spans, exports to Tempo + generates metrics |
| Tempo | 3200 | Trace storage and query |
| Prometheus | 9090 | Scrapes spanmetrics from Collector on :8889 |
| Grafana | 3000 | Dashboards for traces + metrics |
| AxonFlow Agent | 8080 | Sends spans to Collector |
Decision Mode Dashboard
The repository ships a pre-provisioned Grafana dashboard at grafana/dashboards/decision-mode-overview.json with nine panels:
- Decision Rate -- decisions per second over time.
- Verdict Distribution -- donut chart of allow/deny/needs_approval.
- Error Rate -- ratio of
STATUS_CODE_ERRORspans to total. - Decision Latency (P50/P95/P99) -- histogram quantiles from
duration_milliseconds_bucket. - Decisions by Stage -- breakdown by
llm,tool,agent. - Policy Trigger Rate -- stacked bar chart by verdict.
- Per-Tenant Decision Volume -- per-
tenant.idrate. - Deny Rate by Tenant -- table of deny counts per tenant over the selected range.
- Latency Heatmap -- heatmap of
duration_milliseconds_bucket.
Template variables $org_id and $tenant_id filter all panels.
Spanmetrics Details
The spanmetrics connector in config/otel-collector-grafana.yaml generates three metric families from axonflow.decision spans:
| Metric | Type | Description |
|---|---|---|
calls_total | counter | Total span count, labeled by decision_verdict, decision_stage, org_id, tenant_id |
duration_milliseconds_bucket | histogram | Span duration distribution with buckets at 1, 2, 5, 10, 25, 50, 100, 250, 500, 1000 ms |
duration_milliseconds_sum / _count | counter | Sum and count for average latency calculation |
These are standard Prometheus metrics, queryable with PromQL and usable in any Prometheus-compatible alerting system.
Using with Existing Prometheus
If you already run Prometheus, add the OTel Collector's metrics endpoint to your scrape config:
scrape_configs:
- job_name: 'otel-spanmetrics'
static_configs:
- targets: ['otel-collector:8889']
LangSmith Trace Correlation
AxonFlow's decision tracer generates a W3C-compliant trace_id that is returned in every API response. If you use LangSmith for LLM observability, you can correlate AxonFlow governance decisions with LangSmith runs by propagating the trace_id as LangSmith run metadata.
How It Works
- Your application calls the Decision API (
POST /api/v1/decide) before forwarding a request. - AxonFlow returns a
trace_idin the response (32-character lowercase hex W3C format). - Your application passes that
trace_idas metadata to LangSmith when starting the LLM run. - In LangSmith, search by
trace_idto see both the LLM execution and the governance decision side by side.
Get the trace_id
The Decision API response includes trace_id directly:
curl -X POST http://localhost:8080/api/v1/decide \
-H "Content-Type: application/json" \
-d '{
"stage": "llm",
"query": "What is the user SSN?",
"target": "gpt-4o",
"caller_id": "support-agent"
}'
Response:
{
"verdict": "deny",
"decision_id": "01J5K...",
"trace_id": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6",
"reasons": ["PII detection: SSN pattern matched"],
"evaluated_policies": ["p_pii_us"],
"stage": "llm"
}
Pass trace_id to LangSmith (Python)
import requests
from langsmith import traceable
decision = requests.post(
"http://localhost:8080/api/v1/decide",
json={"stage": "llm", "query": query, "target": "gpt-4o", "caller_id": "support-agent"},
).json()
trace_id = decision["trace_id"]
@traceable(metadata={"axonflow_trace_id": trace_id})
def run_llm(query: str):
pass
if decision["verdict"] == "allow":
run_llm(query)
Pass trace_id to LangSmith (TypeScript)
const decision = await fetch("http://localhost:8080/api/v1/decide", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
stage: "llm",
query: "What is the user SSN?",
target: "gpt-4o",
caller_id: "support-agent",
}),
}).then((r) => r.json());
// Pass trace_id to your LangSmith-instrumented function
await tracedLlmCall("What is the user SSN?", {
metadata: { axonflow_trace_id: decision.trace_id },
});
No code changes are needed on the AxonFlow side. The trace_id is emitted by the decision tracer and included in the Decision API response when AXONFLOW_OTEL_ENDPOINT is configured.
