Load Testing
k6 scripts for smoke, sustained load, and stress testing — targeting the governance pipeline, dashboard, and escalation endpoints. Integrated into the deployment pipeline as a gate between staging and production.
Three Test Profiles
Smoke
1 virtual user, 1 iteration. Validates every endpoint responds correctly — health, pipeline (all verdict paths), shadow mode, dashboard, and escalation. Any failure aborts. Runs first in CI.
Load
Three concurrent scenarios: pipeline submissions ramping to 8 req/s, dashboard polling with 10 virtual users, and escalation queries at 2 virtual users. Measures p95 latency under sustained realistic traffic for 7.5 minutes.
Stress
Ramps pipeline traffic to 50 req/s (100 VUs) and dashboard to 80 VUs. Finds the breaking point, measures when 429s appear, and verifies the system recovers after load drops. Expects degradation but not collapse.
Running Locally
Install k6, then start the API in open mode:
# Start API (no auth, no rate limiting)
GAAS_OPEN_MODE=true GAAS_RATE_LIMIT_ENABLED=false \
uvicorn packages.api.main:app
# Run smoke test
k6 run tests/load/smoke.js
# Run load test with JSON export
k6 run tests/load/load.js --out json=results.json
# Run stress test
k6 run tests/load/stress.js
Override the target URL for remote environments:
K6_BASE_URL=https://staging.example.com K6_API_KEY=gsk_... \
k6 run tests/load/smoke.js
Payload Mix
The load and stress tests select intents from a weighted random pool that exercises all verdict paths through the governance pipeline:
| Scenario | Pipeline Path | Weight |
|---|---|---|
| Clean low-risk | Full pipeline → APPROVE | 50% |
| PCI regulated data | Policy fast-fail → BLOCK | 15% |
| High financial ($75k) | Triggers deliberation → ESCALATE | 15% |
| Minor data (COPPA) | Policy fast-fail → BLOCK | 10% |
| Audit tampering | Policy fast-fail → BLOCK | 10% |
Pass/Fail Thresholds
| Endpoint Group | Metric | Threshold |
|---|---|---|
| Pipeline (live & shadow) | p95 latency | < 2,000ms |
| Dashboard | p95 latency | < 500ms |
| Escalation | p95 latency | < 500ms |
| Health / Metrics | p95 latency | < 200ms |
| Global | Error rate | < 5% |
| Stress (pipeline) | Success rate | ≥ 70% |
Custom Metrics
Beyond standard k6 HTTP metrics, the load tests track governance-specific measurements:
| Metric | Type | Source |
|---|---|---|
gaas_pipeline_latency_ms |
Trend | Extracted from X-GaaS-Pipeline-Latency-Ms response header — server-side pipeline execution time |
gaas_verdicts_approve |
Rate | Fraction of decisions that result in approve or approve_modified |
gaas_error_rate |
Rate | Application-level errors (non-200 on expected-success requests) |
gaas_rate_limit_429s |
Counter | Rate-limit rejections during stress tests |
CI Integration
Load tests run automatically in the deployment pipeline after staging deploy and before production. The GitHub Actions workflow:
- Pulls the freshly built API container image from GHCR
- Starts it with
GAAS_OPEN_MODE=trueandGAAS_RATE_LIMIT_ENABLED=false - Waits for the health endpoint to respond
- Runs smoke tests (gate) then load tests
- Uploads JSON results as build artifacts (30-day retention)
Environment Variables
| Variable | Default | Description |
|---|---|---|
K6_BASE_URL |
http://localhost:8000 |
Target API base URL |
K6_API_KEY |
(empty) | API key for authenticated environments |
GAAS_OPEN_MODE |
false |
Set to true on the API server to skip authentication |
GAAS_RATE_LIMIT_ENABLED |
true |
Set to false on the API server to disable rate limiting |
Related Pages
- Observability & Alerting — Prometheus metrics that k6 results correlate with
- Intent Declaration API — the pipeline endpoints under test
- Conversational Dashboard — the dashboard endpoints under test