Observability & Alerting
Structured logging, Prometheus metrics, distributed tracing, and threshold-based anomaly alerting — built into the governance pipeline.
Three Layers
GaaS observability is designed around three layers that work independently. You can run with just structured logs, add Prometheus scraping, and optionally enable OpenTelemetry tracing — each layer requires zero code changes.
Structured Logging
JSON-formatted structured logs via structlog. Every log entry carries a request_id for correlation across the pipeline. Set GAAS_LOG_LEVEL to control verbosity.
Prometheus Metrics
Scrape GET /metrics for HTTP request counts, latency histograms, pipeline stage durations, decision verdicts, error rates, circuit breaker state, and more. No authentication required on this endpoint.
Distributed Tracing
Optional OpenTelemetry integration. Set GAAS_TRACING_ENABLED=true and GAAS_OTEL_ENDPOINT to send spans for every pipeline stage to your collector. Zero overhead when disabled.
Anomaly Alerting
Beyond passive metrics, GaaS actively monitors operational health and fires alerts when thresholds are exceeded. The anomaly monitor evaluates metrics on every governance decision and through a periodic background check, dispatching events through the notification engine.
Event Types
| Event | Trigger | Severity |
|---|---|---|
circuit_breaker_tripped |
Learning calibration circuit breaker freezes due to quality degradation | Critical |
critical_escalation |
Escalation created with risk score ≥ 0.7 | Critical |
high_error_rate |
More than 10 application errors in a 5-minute window | Critical |
block_rate_anomaly |
Block rate exceeds 50% over recent decisions (min 10 sample size) | Warning |
pipeline_degraded |
Any single pipeline execution exceeds 5,000ms | Warning |
rate_limit_spike |
More than 50 rate-limit rejections in a 5-minute window | Warning |
escalation_queue_growth |
Pending escalation count ≥ 10 and growing (checked every 60s) | Warning |
cooldown_minutes: 15 will fire at most once every 15 minutes for the same event type, preventing alert storms during sustained anomalies.
Notification Channels
Alert events are dispatched through configurable channels. Each notification rule specifies a channel and recipients.
| Channel | Configuration | Recipient Format |
|---|---|---|
| Webhook | No env vars needed | URL — receives JSON POST with title, message, metadata |
GAAS_SMTP_HOST, GAAS_SMTP_PORT, GAAS_SMTP_USER, GAAS_SMTP_PASS, GAAS_SMTP_FROM |
Email address | |
| PagerDuty | GAAS_PAGERDUTY_ROUTING_KEY |
PagerDuty service key |
Channels degrade gracefully — if SMTP is not configured, email rules silently skip. The same applies to PagerDuty. Webhook requires no server-side configuration.
Notification Rules
Rules are managed via the dashboard API. Each rule matches an event type (exact or wildcard) to a channel and recipient list.
POST /v1/dashboard/notifications/rules
Content-Type: application/json
{
"name": "Ops team — all critical events",
"trigger": "*_critical*",
"channel": "webhook",
"recipients": ["https://hooks.slack.com/services/T00/B00/xxx"],
"cooldown_minutes": 15
}
Trigger patterns:
circuit_breaker_tripped— exact matchcircuit_*— wildcard prefix (matches any event starting withcircuit_)*— catch-all, matches every event
Prometheus Metrics Reference
| Metric | Type | Labels |
|---|---|---|
gaas_http_requests_total |
Counter | method, path_template, status_code |
gaas_http_request_duration_seconds |
Histogram | method, path_template |
gaas_active_requests |
Gauge | — |
gaas_pipeline_stage_duration_seconds |
Histogram | stage |
gaas_pipeline_total_duration_seconds |
Histogram | — |
gaas_decisions_total |
Counter | verdict, pipeline_mode |
gaas_errors_total |
Counter | error_code |
gaas_rate_limit_hits_total |
Counter | scope |
gaas_circuit_breaker_state |
Gauge | breaker_name (0=normal, 1=frozen) |
gaas_escalations_created_total |
Counter | — |
gaas_deliberation_triggered_total |
Counter | — |
Environment Variables
| Variable | Default | Description |
|---|---|---|
GAAS_LOG_LEVEL |
info |
Structlog level (debug, info, warning, error) |
GAAS_TRACING_ENABLED |
false |
Enable OpenTelemetry tracing |
GAAS_OTEL_ENDPOINT |
localhost:4317 |
OTLP gRPC collector endpoint |
GAAS_SMTP_HOST |
— | SMTP server for email notifications |
GAAS_SMTP_PORT |
587 |
SMTP port |
GAAS_SMTP_USER |
— | SMTP username |
GAAS_SMTP_PASS |
— | SMTP password |
GAAS_SMTP_FROM |
— | From address for email alerts |
GAAS_PAGERDUTY_ROUTING_KEY |
— | PagerDuty Events API v2 routing key |
Related Pages
- Conversational Dashboard — manage notification rules and review alerts through natural language
- Getting Started — onboard, integrate, and go live
- Intent Declaration API — the pipeline that generates the metrics