Observability¶
PulseRoute provides full observability through Prometheus metrics, a pre-built Grafana dashboard, and webhook alerts.
Prometheus Metrics¶
All metrics are exposed at GET /metrics/ (no authentication required).
HTTP Metrics¶
| Metric | Type | Labels | Description |
|---|---|---|---|
pulseroute_requests_total |
Counter | method, path, status_code | Total HTTP requests |
pulseroute_request_duration_seconds |
Histogram | method, path | Request latency |
Engine Metrics¶
| Metric | Type | Labels | Description |
|---|---|---|---|
pulseroute_transactions_routed_total |
Counter | Routing decisions served | |
pulseroute_transactions_reported_total |
Counter | Outcomes reported | |
pulseroute_failover_events_total |
Counter | rule_id, from_processor, to_processor | Failover events |
Processor Metrics¶
| Metric | Type | Labels | Description |
|---|---|---|---|
pulseroute_processor_health_status |
Gauge | rule_id, processor_id | 1=healthy, 0.5=degraded, 0=failed_over |
pulseroute_processor_success_rate |
Gauge | rule_id, processor_id | Current success rate (0-1) |
pulseroute_processor_latency_p95 |
Gauge | rule_id, processor_id | P95 latency in ms |
pulseroute_processor_failure_probability |
Gauge | rule_id, processor_id | Tier 2 LSTM prediction (0-1) |
pulseroute_bandit_weight |
Gauge | rule_id, processor_id | Tier 3 traffic allocation (0-1) |
System Metrics¶
| Metric | Type | Description |
|---|---|---|
pulseroute_engine_tier |
Gauge | Current tier level (1, 2, or 3) |
pulseroute_redis_connected |
Gauge | Redis connectivity (0 or 1) |
Grafana Dashboard¶
A pre-built dashboard is auto-provisioned when you start the monitoring stack.
Setup¶
Open Grafana at http://localhost:3001 (admin / pulseroute).
Navigate to Dashboards > PulseRoute or go directly to:
http://localhost:3001/d/pulseroute-main/pulseroute
Dashboard Sections¶
System Overview — stat panels showing engine tier, Redis status, total transactions, failover count, and request rate.
Processor Health — per-processor status indicators (healthy/degraded/failed), success rate time series, and P95 latency.
Tier 2: Failure Prediction — LSTM failure probability over time with threshold lines (0.3 warning, 0.6 critical), and current probability bar gauges.
Tier 3: Multi-Armed Bandit — traffic allocation stacked area chart, current traffic split donut, and failover events bar chart.
HTTP Performance — request rate by endpoint, latency percentiles (p50/p95/p99), and requests by status code.
Webhook Alerts¶
PulseRoute fires webhooks for critical events. Configure endpoints via the API.
Register a Webhook¶
curl -X POST http://localhost:8080/v1/webhooks \
-H "Content-Type: application/json" \
-d '{
"url": "https://your-service.com/pulseroute-webhook",
"event_types": [
"failover.triggered",
"failover.recovered",
"processor.degraded",
"processor.healthy"
],
"secret": "your-hmac-secret"
}'
Event Types¶
| Event | Fired When |
|---|---|
failover.triggered |
Primary processor fails, traffic moves to secondary |
failover.recovered |
Primary recovers, traffic returns |
processor.degraded |
Processor error rate exceeds degradation threshold |
processor.healthy |
Processor returns to healthy status |
Webhook Payload¶
{
"event_id": "evt_abc123",
"event_type": "failover.triggered",
"timestamp": "2026-04-01T12:00:00",
"data": {
"rule_id": "us_cards",
"from_processor": "stripe",
"to_processor": "adyen",
"reason": "Primary error rate 35.0% exceeds threshold 30.0%"
}
}
HMAC Verification¶
If you provide a secret when registering, every delivery includes an X-PulseRoute-Signature header:
Verify it in your handler:
Delivery Behavior¶
- Retries: 3 attempts with exponential backoff (1s, 2s, 4s)
- Timeout: 10 seconds per attempt
- Parallel delivery: Multiple webhooks for the same event are delivered concurrently
Integration Examples¶
Slack (via incoming webhook):
Register PulseRoute to POST to your Slack webhook URL. Use a middleware service to transform the PulseRoute payload into Slack's block format.
PagerDuty:
Point webhook at PagerDuty's Events API v2 endpoint. Map failover.triggered to a PagerDuty alert.
Custom monitoring:
# List registered webhooks
curl http://localhost:8080/v1/webhooks
# Delete a webhook
curl -X DELETE http://localhost:8080/v1/webhooks/{webhook_id}
Prometheus Scrape Config¶
PulseRoute includes a pre-configured prometheus.yml that works out of the box with Docker Compose. To add custom scrape targets or alerting rules, edit the file and restart Prometheus.