Load Testing

k6 scripts for smoke, sustained load, and stress testing — targeting the governance pipeline, dashboard, and escalation endpoints. Integrated into the deployment pipeline as a gate between staging and production.

Three Test Profiles

1

Smoke

1 virtual user, 1 iteration. Validates every endpoint responds correctly — health, pipeline (all verdict paths), shadow mode, dashboard, and escalation. Any failure aborts. Runs first in CI.

2

Load

Three concurrent scenarios: pipeline submissions ramping to 8 req/s, dashboard polling with 10 virtual users, and escalation queries at 2 virtual users. Measures p95 latency under sustained realistic traffic for 7.5 minutes.

3

Stress

Ramps pipeline traffic to 50 req/s (100 VUs) and dashboard to 80 VUs. Finds the breaking point, measures when 429s appear, and verifies the system recovers after load drops. Expects degradation but not collapse.


Running Locally

Install k6, then start the API in open mode:

# Start API (no auth, no rate limiting)
GAAS_OPEN_MODE=true GAAS_RATE_LIMIT_ENABLED=false \
  uvicorn packages.api.main:app

# Run smoke test
k6 run tests/load/smoke.js

# Run load test with JSON export
k6 run tests/load/load.js --out json=results.json

# Run stress test
k6 run tests/load/stress.js

Override the target URL for remote environments:

K6_BASE_URL=https://staging.example.com K6_API_KEY=gsk_... \
  k6 run tests/load/smoke.js

Payload Mix

The load and stress tests select intents from a weighted random pool that exercises all verdict paths through the governance pipeline:

Scenario Pipeline Path Weight
Clean low-risk Full pipeline → APPROVE 50%
PCI regulated data Policy fast-fail → BLOCK 15%
High financial ($75k) Triggers deliberation → ESCALATE 15%
Minor data (COPPA) Policy fast-fail → BLOCK 10%
Audit tampering Policy fast-fail → BLOCK 10%
Deliberation coverage. The high-financial scenario triggers multi-agent deliberation — the most expensive code path and the primary target for latency measurement.

Pass/Fail Thresholds

Endpoint Group Metric Threshold
Pipeline (live & shadow) p95 latency < 2,000ms
Dashboard p95 latency < 500ms
Escalation p95 latency < 500ms
Health / Metrics p95 latency < 200ms
Global Error rate < 5%
Stress (pipeline) Success rate ≥ 70%

Custom Metrics

Beyond standard k6 HTTP metrics, the load tests track governance-specific measurements:

Metric Type Source
gaas_pipeline_latency_ms Trend Extracted from X-GaaS-Pipeline-Latency-Ms response header — server-side pipeline execution time
gaas_verdicts_approve Rate Fraction of decisions that result in approve or approve_modified
gaas_error_rate Rate Application-level errors (non-200 on expected-success requests)
gaas_rate_limit_429s Counter Rate-limit rejections during stress tests

CI Integration

Load tests run automatically in the deployment pipeline after staging deploy and before production. The GitHub Actions workflow:

Rate limiting disabled in CI. Load tests measure pipeline performance, not rate limiter behavior. The stress test can be run separately with rate limiting enabled to validate backpressure handling.

Environment Variables

Variable Default Description
K6_BASE_URL http://localhost:8000 Target API base URL
K6_API_KEY (empty) API key for authenticated environments
GAAS_OPEN_MODE false Set to true on the API server to skip authentication
GAAS_RATE_LIMIT_ENABLED true Set to false on the API server to disable rate limiting

Related Pages