Performance Testing Architecture
AxonFlow's performance testing infrastructure uses a three-layer design that balances continuous monitoring with cost efficiency. This guide explains the architecture and how each layer contributes to system reliability.
Three-Layer Design
┌─────────────────────────────────────────────────────────────┐
│ LAYER 1: HEARTBEAT │
│ ───────────────────────────────────────────────────────── │
│ • Continuous health monitoring │
│ • Keeps dashboards active │
│ • Minimal resource usage │
│ • Alternates between test modes │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ LAYER 2: SCHEDULED │
│ ───────────────────────────────────────────────────────── │
│ • Hourly: Baseline verification │
│ • Daily: Extended stress testing │
│ • Weekly: Comprehensive benchmark suite │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ LAYER 3: MANUAL │
│ ───────────────────────────────────────────────────────── │
│ • Ad-hoc testing for specific scenarios │
│ • Custom RPS and duration │
│ • Pre-deployment validation │
│ • Incident investigation │
└─────────────────────────────────────────────────────────────┘
Layer 1: Heartbeat
The heartbeat layer provides continuous, lightweight monitoring.
Purpose
- Keep observability dashboards populated with data
- Detect system availability issues quickly
- Minimal cost (single request every few seconds)
- Confirm basic connectivity and authentication
Behavior
Mode A: Policy-only requests (no LLM costs)
↓
[Wait interval]
↓
Mode B: Health check requests
↓
[Wait interval]
↓
(repeat)
Characteristics
| Aspect | Value |
|---|---|
| Frequency | Every few seconds |
| Request Volume | Minimal |
| Cost Impact | Negligible |
| Data Retention | Real-time only |
Layer 2: Scheduled Tests
Scheduled tests run at fixed intervals to establish performance baselines.
Hourly Tests
Purpose: Verify system maintains baseline performance throughout the day.
Characteristics:
- Short duration
- Moderate request volume
- Captures hourly trends
- Alerts on regression
Daily Tests
Purpose: Extended stress testing during low-traffic windows.
Characteristics:
- Longer duration
- Higher request volume
- Tests sustained load capability
- Generates daily benchmark data
Weekly Tests
Purpose: Comprehensive benchmark suite for trend analysis.
Characteristics:
- Full test coverage
- All test categories
- Historical comparison
- Capacity planning data
Scheduling Strategy
Hour 00 01 02 03 04 05 06 07 08 09 10 11 12 ...
│ │ │ │ │ │ │ │ │ │ │ │ │
Hourly ● ● ● ● ● ● ● ● ● ● ● ● ● ...
Daily ● (low traffic window)
Weekly ● (once per week)
Layer 3: Manual Tests
Manual tests are triggered on-demand for specific purposes.
Use Cases
-
Pre-Deployment Validation
- Run before production deployments
- Verify new code doesn't regress performance
- Gate for release approval
-
Incident Investigation
- Reproduce reported performance issues
- Validate fixes before deployment
- Compare before/after behavior
-
Capacity Planning
- Test higher-than-normal loads
- Plan for growth scenarios
- Validate scaling configurations
-
Custom Scenarios
- Test specific request patterns
- Validate new features under load
- Security testing validation
Execution
# Pre-deployment validation
./perf-test --mode pre-deploy --environment staging
# Capacity test
./perf-test --mode stress --rps 500 --duration 10m
# Custom scenario
./perf-test --mode manual --category policy --rps 100
Cost Optimization
Performance testing can generate significant costs if not managed carefully.
Policy-Only Mode
Most tests run in "policy-only" mode:
Standard Request:
Client → Agent → Orchestrator → LLM Provider → Response
Cost: $0.001-0.01 per request (LLM tokens)
Policy-Only Request:
Client → Agent → Policy Engine → Response
Cost: $0 (no LLM calls)
When to use policy-only:
- Heartbeat monitoring
- Policy performance validation
- Most scheduled tests
When to use full requests:
- LLM integration testing
- End-to-end validation
- Weekly benchmarks (limited scope)
Environment Strategy
Development:
└── Local Docker (free)
Staging:
└── All scheduled tests run here
└── Pre-deployment validation
└── Cost: Included in staging infrastructure
Production:
└── Heartbeat only (minimal)
└── Post-deployment smoke tests
└── No scheduled load tests
Metrics and Observability
Prometheus Metrics
Performance tests export comprehensive metrics:
# Latency percentiles
perf_test_latency_p50_ms
perf_test_latency_p95_ms
perf_test_latency_p99_ms
# Request rates
perf_test_requests_total{status="success"}
perf_test_requests_total{status="blocked"}
perf_test_requests_total{status="error"}
# Test metadata
perf_test_info{layer="scheduled", interval="hourly"}
Grafana Dashboards
Standard dashboards include:
-
Real-Time Performance
- Live latency percentiles
- Request rate graphs
- Error rate monitoring
-
Historical Trends
- Daily/weekly comparisons
- Regression detection
- Capacity utilization
-
Test Execution
- Scheduled test status
- Manual test results
- Alert history
Alerting
Automated Alerts
| Condition | Severity | Action |
|---|---|---|
| P99 > threshold | Warning | Investigate |
| Error rate > 0.1% | Critical | Page on-call |
| Heartbeat failure | Critical | Auto-escalate |
| Daily test failure | Warning | Review before deploy |
Alert Routing
Layer 1 (Heartbeat) failures → Immediate page
Layer 2 (Scheduled) failures → Slack notification
Layer 3 (Manual) failures → Test runner notified
Best Practices
1. Start with Staging
Never run load tests against production without explicit approval:
✅ Staging: Full test suite
✅ Production: Heartbeat + smoke tests only
❌ Production: Scheduled load tests
2. Review Before Scaling
Before increasing test intensity:
- Verify system can handle current load
- Check infrastructure costs
- Confirm no external rate limits
3. Maintain Historical Data
Keep test results for trend analysis:
- Minimum 90 days of daily data
- Minimum 1 year of weekly data
- Archive before deleting
4. Document Anomalies
When tests show unexpected results:
- Capture full metrics
- Note environmental factors
- Document investigation findings
- Update runbooks if needed
Next Steps
- Load Testing Methodology - Detailed load testing guide
- Testing Overview - Complete testing pyramid