Observability & Alerting

Structured logging, Prometheus metrics, distributed tracing, and threshold-based anomaly alerting — built into the governance pipeline.

Three Layers

GaaS observability is designed around three layers that work independently. You can run with just structured logs, add Prometheus scraping, and optionally enable OpenTelemetry tracing — each layer requires zero code changes.

1

Structured Logging

JSON-formatted structured logs via structlog. Every log entry carries a request_id for correlation across the pipeline. Set GAAS_LOG_LEVEL to control verbosity.

2

Prometheus Metrics

Scrape GET /metrics for HTTP request counts, latency histograms, pipeline stage durations, decision verdicts, error rates, circuit breaker state, and more. No authentication required on this endpoint.

3

Distributed Tracing

Optional OpenTelemetry integration. Set GAAS_TRACING_ENABLED=true and GAAS_OTEL_ENDPOINT to send spans for every pipeline stage to your collector. Zero overhead when disabled.


Anomaly Alerting

Beyond passive metrics, GaaS actively monitors operational health and fires alerts when thresholds are exceeded. The anomaly monitor evaluates metrics on every governance decision and through a periodic background check, dispatching events through the notification engine.

Event Types

Event Trigger Severity
circuit_breaker_tripped Learning calibration circuit breaker freezes due to quality degradation Critical
critical_escalation Escalation created with risk score ≥ 0.7 Critical
high_error_rate More than 10 application errors in a 5-minute window Critical
block_rate_anomaly Block rate exceeds 50% over recent decisions (min 10 sample size) Warning
pipeline_degraded Any single pipeline execution exceeds 5,000ms Warning
rate_limit_spike More than 50 rate-limit rejections in a 5-minute window Warning
escalation_queue_growth Pending escalation count ≥ 10 and growing (checked every 60s) Warning
Deduplication. The notification engine respects per-rule cooldown periods. A rule with cooldown_minutes: 15 will fire at most once every 15 minutes for the same event type, preventing alert storms during sustained anomalies.

Notification Channels

Alert events are dispatched through configurable channels. Each notification rule specifies a channel and recipients.

Channel Configuration Recipient Format
Webhook No env vars needed URL — receives JSON POST with title, message, metadata
Email GAAS_SMTP_HOST, GAAS_SMTP_PORT, GAAS_SMTP_USER, GAAS_SMTP_PASS, GAAS_SMTP_FROM Email address
PagerDuty GAAS_PAGERDUTY_ROUTING_KEY PagerDuty service key

Channels degrade gracefully — if SMTP is not configured, email rules silently skip. The same applies to PagerDuty. Webhook requires no server-side configuration.


Notification Rules

Rules are managed via the dashboard API. Each rule matches an event type (exact or wildcard) to a channel and recipient list.

POST /v1/dashboard/notifications/rules
Content-Type: application/json

{
  "name": "Ops team — all critical events",
  "trigger": "*_critical*",
  "channel": "webhook",
  "recipients": ["https://hooks.slack.com/services/T00/B00/xxx"],
  "cooldown_minutes": 15
}

Trigger patterns:

Dashboard integration. You can also create and manage notification rules through the conversational dashboard using natural language.

Prometheus Metrics Reference

Metric Type Labels
gaas_http_requests_total Counter method, path_template, status_code
gaas_http_request_duration_seconds Histogram method, path_template
gaas_active_requests Gauge
gaas_pipeline_stage_duration_seconds Histogram stage
gaas_pipeline_total_duration_seconds Histogram
gaas_decisions_total Counter verdict, pipeline_mode
gaas_errors_total Counter error_code
gaas_rate_limit_hits_total Counter scope
gaas_circuit_breaker_state Gauge breaker_name (0=normal, 1=frozen)
gaas_escalations_created_total Counter
gaas_deliberation_triggered_total Counter

Environment Variables

Variable Default Description
GAAS_LOG_LEVEL info Structlog level (debug, info, warning, error)
GAAS_TRACING_ENABLED false Enable OpenTelemetry tracing
GAAS_OTEL_ENDPOINT localhost:4317 OTLP gRPC collector endpoint
GAAS_SMTP_HOST SMTP server for email notifications
GAAS_SMTP_PORT 587 SMTP port
GAAS_SMTP_USER SMTP username
GAAS_SMTP_PASS SMTP password
GAAS_SMTP_FROM From address for email alerts
GAAS_PAGERDUTY_ROUTING_KEY PagerDuty Events API v2 routing key

Related Pages