Skip to main content

Load Testing Methodology

AxonFlow includes a custom load testing framework designed to validate system performance under realistic production conditions. This guide covers our load testing methodology and principles.

Tools

AxonFlow supports load testing with several industry-standard tools:

ToolBest ForInstallation
k6 (recommended)Scriptable load tests, CI integrationbrew install k6 or k6.io
wrkSimple HTTP benchmarkingbrew install wrk
heyQuick one-liner load testsbrew install hey
AxonFlow internal harnessSustained-load and staging validation used by the platform teamee/platform/load-testing in the main platform repo

Recommendation: Use k6 for scripted, reproducible public/community load tests. Use the internal harness only if you are working directly in the main platform repository and understand its environment assumptions.

Target Metrics

These are the target performance numbers that AxonFlow should meet under load:

MetricTargetDegradedCritical
P50 latency< 3ms3-5ms> 5ms
P95 latency< 10ms10-20ms> 20ms
P99 latency< 25ms25-50ms> 50ms
Throughput> 1,000 RPS per node500-1,000 RPS< 500 RPS
Error rate0%< 0.1%> 0.1%
Policy block rate~20% (depends on test mix)Varies0% (policy inactive)

Example k6 Load Test

The following k6 script targets the current Agent request entrypoint, POST /api/request, with a simple ramp-up pattern. It is intentionally community-friendly and uses skip_llm: true so you can measure governance and routing overhead without paying provider latency.

import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';

const errorRate = new Rate('errors');
const latency = new Trend('axonflow_latency');

export const options = {
stages: [
{ duration: '30s', target: 10 }, // ramp up to 10 RPS
{ duration: '1m', target: 50 }, // hold at 50 RPS
{ duration: '2m', target: 100 }, // push to 100 RPS
{ duration: '30s', target: 0 }, // ramp down
],
thresholds: {
http_req_duration: ['p(95)<10', 'p(99)<25'],
errors: ['rate<0.001'],
},
};

const AGENT_URL = __ENV.AXONFLOW_ENDPOINT || 'http://localhost:8080';
const CLIENT_ID = __ENV.AXONFLOW_CLIENT_ID || 'community';

export default function () {
const res = http.post(
`${AGENT_URL}/api/request`,
JSON.stringify({
query: 'What is 2+2?',
client_id: CLIENT_ID,
request_type: 'chat',
user_token: 'load-test-user',
skip_llm: true,
}),
{
headers: {
'Content-Type': 'application/json',
},
}
);

check(res, {
'status is 200': (r) => r.status === 200,
'latency < 10ms': (r) => r.timings.duration < 10,
});

errorRate.add(res.status !== 200);
latency.add(res.timings.duration);

sleep(0.1);
}

Run with:

k6 run --env AXONFLOW_ENDPOINT=http://localhost:8080 load-test.js

Baseline Results

Reference baseline numbers should be generated in your own environment and kept with the workload definition you use. The platform ships internal sustained-load harnesses under ee/platform/load-testing, but those are environment-specific and should not be copied blindly into public capacity commitments.

Capture at least:

  • target RPS versus achieved RPS
  • P50, P95, and P99 latency
  • success, blocked, and unexpected-response counts
  • connector and database saturation indicators when applicable

Design Principles

Even Distribution

Our load generator uses a ticker-based approach rather than burst patterns:

Burst Pattern (Wrong):          Even Distribution (Correct):
───────────────────── ─────────────────────────────
│████████░░░░░░░░░░│ │█░█░█░█░█░█░█░█░█░█░█░█░█░│
│░░░░░░░░████████░░│ │█░█░█░█░█░█░█░█░█░█░█░█░█░│
│░░░░░░░░░░░░░░████│ │█░█░█░█░█░█░█░█░█░█░█░█░█░│
───────────────────── ─────────────────────────────
Unrealistic spikes Real client behavior

Why it matters: Burst patterns can mask performance issues that only appear under sustained load. Even distribution simulates actual production traffic patterns.

Percentile Accuracy

We track latency percentiles using mathematically correct calculations:

PercentileMeaning
P50Median - 50% of requests complete faster
P9595th percentile - 19 out of 20 requests
P9999th percentile - 99 out of 100 requests

Why percentiles matter:

  • Averages hide outliers
  • P95/P99 reveal tail latency issues
  • SLAs are typically defined using percentiles

Realistic Client Behavior

Our load generator simulates real production clients:

  • Connection pooling: Reuses connections like production clients
  • HTTP/2: Modern protocol with multiplexing
  • TLS 1.3: Full encryption overhead included
  • Keep-alive: Long-lived connections

Test Categories

Load tests are organized into categories that validate different behaviors:

1. Normal Queries

Standard requests that should succeed:

Category: normal
Expected: Success (200 OK)
Purpose: Validate happy path performance

2. Security Violation Tests

Requests that should be rejected by security policies:

Category: security
Expected: Rejection (403 Forbidden)
Purpose: Validate security rules are enforced

3. Policy Violation Tests

Requests that violate governance policies:

Category: policy
Expected: Blocking (varies by policy)
Purpose: Validate policy engine correctness

4. LLM Integration Tests

Requests that involve LLM providers:

Category: llm
Expected: Success (200 OK)
Purpose: Validate LLM routing and response handling

Metrics Collection

Prometheus Integration

Load-test results can be exported to Prometheus-compatible metrics or written to JSON for Grafana dashboards, depending on the harness you use:

# Latency histogram
load_test_latency_ms{quantile="0.5"} 2.4
load_test_latency_ms{quantile="0.95"} 4.8
load_test_latency_ms{quantile="0.99"} 8.2

# Request counters
load_test_requests_total{status="success"} 15000
load_test_requests_total{status="blocked"} 3500
load_test_requests_total{status="error"} 0

Labels and Dimensions

Results are tagged with context:

LabelPurpose
clientClient identifier
test_typeTest category (normal, security, etc.)
environmentTarget environment
providerLLM provider (if applicable)

Running Load Tests

Prerequisites

  • Access to target environment
  • Valid authentication credentials if your environment enforces them
  • Prometheus Pushgateway if you want to export ad-hoc run results

Basic Execution

# Run the k6 script against local community mode
k6 run --env AXONFLOW_ENDPOINT=http://localhost:8080 --env AXONFLOW_CLIENT_ID=community load-test.js

Test Parameters

If you use the internal harness under ee/platform/load-testing, read the source and wrapper scripts first. That tooling expects a specific environment shape and is better treated as a platform-team tool than as a generic end-user CLI.

Results Interpretation

Healthy Results

✅ P50: &lt;5ms (excellent)
✅ P95: &lt;10ms (within target)
✅ Error Rate: 0%
✅ Blocked Rate: ~20% (policy working)

Warning Signs

⚠️ P95 > 20ms: Investigate latency issues
⚠️ Error Rate > 0.1%: Check system logs
⚠️ P50 increasing: Possible degradation
⚠️ Blocked Rate 0%: Policy might not be active

Best Practices

1. Start Small

Begin with low RPS and increase gradually:

Stage 1: 10 RPS for 30s (warmup)
Stage 2: 50 RPS for 60s (baseline)
Stage 3: 100 RPS for 60s (target load)
Stage 4: 200 RPS for 60s (stress test)

2. Test Staging First

Always validate changes in staging before production:

1. Deploy to staging
2. Run load tests
3. Review metrics
4. If passing, deploy to production
5. Run lighter validation tests

3. Monitor During Tests

Watch these metrics during load tests:

  • CPU and memory utilization
  • Database connection pool
  • Error rates and logs
  • Latency percentiles

4. Clean Up After Tests

  • Scale down test infrastructure
  • Archive test results
  • Document any anomalies

Next Steps