Observability¶

PulseRoute provides full observability through Prometheus metrics, a pre-built Grafana dashboard, and webhook alerts.

Prometheus Metrics¶

All metrics are exposed at GET /metrics/ (no authentication required).

HTTP Metrics¶

Metric	Type	Labels	Description
`pulseroute_requests_total`	Counter	method, path, status_code	Total HTTP requests
`pulseroute_request_duration_seconds`	Histogram	method, path	Request latency

Engine Metrics¶

Metric	Type	Labels	Description
`pulseroute_transactions_routed_total`	Counter		Routing decisions served
`pulseroute_transactions_reported_total`	Counter		Outcomes reported
`pulseroute_failover_events_total`	Counter	rule_id, from_processor, to_processor	Failover events

Processor Metrics¶

Metric	Type	Labels	Description
`pulseroute_processor_health_status`	Gauge	rule_id, processor_id	1=healthy, 0.5=degraded, 0=failed_over
`pulseroute_processor_success_rate`	Gauge	rule_id, processor_id	Current success rate (0-1)
`pulseroute_processor_latency_p95`	Gauge	rule_id, processor_id	P95 latency in ms
`pulseroute_processor_failure_probability`	Gauge	rule_id, processor_id	Tier 2 LSTM prediction (0-1)
`pulseroute_bandit_weight`	Gauge	rule_id, processor_id	Tier 3 traffic allocation (0-1)

System Metrics¶

Metric	Type	Description
`pulseroute_engine_tier`	Gauge	Current tier level (1, 2, or 3)
`pulseroute_redis_connected`	Gauge	Redis connectivity (0 or 1)

Grafana Dashboard¶

A pre-built dashboard is auto-provisioned when you start the monitoring stack.

Setup¶

docker compose --profile monitoring up -d

Open Grafana at http://localhost:3001 (admin / pulseroute).

Navigate to Dashboards > PulseRoute or go directly to: http://localhost:3001/d/pulseroute-main/pulseroute

Dashboard Sections¶

System Overview — stat panels showing engine tier, Redis status, total transactions, failover count, and request rate.

Processor Health — per-processor status indicators (healthy/degraded/failed), success rate time series, and P95 latency.

Tier 2: Failure Prediction — LSTM failure probability over time with threshold lines (0.3 warning, 0.6 critical), and current probability bar gauges.

Tier 3: Multi-Armed Bandit — traffic allocation stacked area chart, current traffic split donut, and failover events bar chart.

HTTP Performance — request rate by endpoint, latency percentiles (p50/p95/p99), and requests by status code.

Webhook Alerts¶

PulseRoute fires webhooks for critical events. Configure endpoints via the API.

Register a Webhook¶

curl -X POST http://localhost:8080/v1/webhooks \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-service.com/pulseroute-webhook",
    "event_types": [
      "failover.triggered",
      "failover.recovered",
      "processor.degraded",
      "processor.healthy"
    ],
    "secret": "your-hmac-secret"
  }'

Event Types¶

Event	Fired When
`failover.triggered`	Primary processor fails, traffic moves to secondary
`failover.recovered`	Primary recovers, traffic returns
`processor.degraded`	Processor error rate exceeds degradation threshold
`processor.healthy`	Processor returns to healthy status

Webhook Payload¶

{
  "event_id": "evt_abc123",
  "event_type": "failover.triggered",
  "timestamp": "2026-04-01T12:00:00",
  "data": {
    "rule_id": "us_cards",
    "from_processor": "stripe",
    "to_processor": "adyen",
    "reason": "Primary error rate 35.0% exceeds threshold 30.0%"
  }
}

HMAC Verification¶

If you provide a secret when registering, every delivery includes an X-PulseRoute-Signature header:

X-PulseRoute-Signature: sha256=<hex-digest>

Verify it in your handler:

PythonNode.js

import hashlib, hmac

def verify(body: bytes, secret: str, signature: str) -> bool:
    expected = hmac.new(secret.encode(), body, hashlib.sha256).hexdigest()
    return hmac.compare_digest(f"sha256={expected}", signature)

const crypto = require('crypto');

function verify(body, secret, signature) {
  const expected = crypto
    .createHmac('sha256', secret)
    .update(body)
    .digest('hex');
  return signature === `sha256=${expected}`;
}

Delivery Behavior¶

Retries: 3 attempts with exponential backoff (1s, 2s, 4s)
Timeout: 10 seconds per attempt
Parallel delivery: Multiple webhooks for the same event are delivered concurrently

Integration Examples¶

Slack (via incoming webhook):

Register PulseRoute to POST to your Slack webhook URL. Use a middleware service to transform the PulseRoute payload into Slack's block format.

PagerDuty:

Point webhook at PagerDuty's Events API v2 endpoint. Map failover.triggered to a PagerDuty alert.

Custom monitoring:

# List registered webhooks
curl http://localhost:8080/v1/webhooks

# Delete a webhook
curl -X DELETE http://localhost:8080/v1/webhooks/{webhook_id}

Prometheus Scrape Config¶

PulseRoute includes a pre-configured prometheus.yml that works out of the box with Docker Compose. To add custom scrape targets or alerting rules, edit the file and restart Prometheus.