Load Testing

k6 scripts for smoke, sustained load, and stress testing — targeting the governance pipeline, dashboard, and escalation endpoints. Integrated into the deployment pipeline as a gate between staging and production.

Three Test Profiles

Smoke

1 virtual user, 1 iteration. Validates every endpoint responds correctly — health, pipeline (all verdict paths), shadow mode, dashboard, and escalation. Any failure aborts. Runs first in CI.

Load

Three concurrent scenarios: pipeline submissions ramping to 8 req/s, dashboard polling with 10 virtual users, and escalation queries at 2 virtual users. Measures p95 latency under sustained realistic traffic for 7.5 minutes.

Stress

Ramps pipeline traffic to 50 req/s (100 VUs) and dashboard to 80 VUs. Finds the breaking point, measures when 429s appear, and verifies the system recovers after load drops. Expects degradation but not collapse.

Running Locally

Install k6, then start the API in open mode:

# Start API (no auth, no rate limiting)
GAAS_OPEN_MODE=true GAAS_RATE_LIMIT_ENABLED=false \
  uvicorn packages.api.main:app

# Run smoke test
k6 run tests/load/smoke.js

# Run load test with JSON export
k6 run tests/load/load.js --out json=results.json

# Run stress test
k6 run tests/load/stress.js

Override the target URL for remote environments:

K6_BASE_URL=https://staging.example.com K6_API_KEY=gsk_... \
  k6 run tests/load/smoke.js

Payload Mix

The load and stress tests select intents from a weighted random pool that exercises all verdict paths through the governance pipeline:

Scenario	Pipeline Path	Weight
Clean low-risk	Full pipeline → APPROVE	50%
PCI regulated data	Policy fast-fail → BLOCK	15%
High financial ($75k)	Triggers deliberation → ESCALATE	15%
Minor data (COPPA)	Policy fast-fail → BLOCK	10%
Audit tampering	Policy fast-fail → BLOCK	10%

Deliberation coverage. The high-financial scenario triggers multi-agent deliberation — the most expensive code path and the primary target for latency measurement.

Pass/Fail Thresholds

Endpoint Group	Metric	Threshold
Pipeline (live & shadow)	p95 latency	< 2,000ms
Dashboard	p95 latency	< 500ms
Escalation	p95 latency	< 500ms
Health / Metrics	p95 latency	< 200ms
Global	Error rate	< 5%
Stress (pipeline)	Success rate	≥ 70%

Custom Metrics

Beyond standard k6 HTTP metrics, the load tests track governance-specific measurements:

Metric	Type	Source
`gaas_pipeline_latency_ms`	Trend	Extracted from `X-GaaS-Pipeline-Latency-Ms` response header — server-side pipeline execution time
`gaas_verdicts_approve`	Rate	Fraction of decisions that result in approve or approve_modified
`gaas_error_rate`	Rate	Application-level errors (non-200 on expected-success requests)
`gaas_rate_limit_429s`	Counter	Rate-limit rejections during stress tests

CI Integration

Load tests run automatically in the deployment pipeline after staging deploy and before production. The GitHub Actions workflow:

Pulls the freshly built API container image from GHCR
Starts it with GAAS_OPEN_MODE=true and GAAS_RATE_LIMIT_ENABLED=false
Waits for the health endpoint to respond
Runs smoke tests (gate) then load tests
Uploads JSON results as build artifacts (30-day retention)

Rate limiting disabled in CI. Load tests measure pipeline performance, not rate limiter behavior. The stress test can be run separately with rate limiting enabled to validate backpressure handling.

Environment Variables

Variable	Default	Description
`K6_BASE_URL`	`http://localhost:8000`	Target API base URL
`K6_API_KEY`	(empty)	API key for authenticated environments
`GAAS_OPEN_MODE`	`false`	Set to `true` on the API server to skip authentication
`GAAS_RATE_LIMIT_ENABLED`	`true`	Set to `false` on the API server to disable rate limiting

Observability & Alerting — Prometheus metrics that k6 results correlate with
Intent Declaration API — the pipeline endpoints under test
Conversational Dashboard — the dashboard endpoints under test