Skip to content

Observability

PulseRoute provides full observability through Prometheus metrics, a pre-built Grafana dashboard, and webhook alerts.

Prometheus Metrics

All metrics are exposed at GET /metrics/ (no authentication required).

HTTP Metrics

Metric Type Labels Description
pulseroute_requests_total Counter method, path, status_code Total HTTP requests
pulseroute_request_duration_seconds Histogram method, path Request latency

Engine Metrics

Metric Type Labels Description
pulseroute_transactions_routed_total Counter Routing decisions served
pulseroute_transactions_reported_total Counter Outcomes reported
pulseroute_failover_events_total Counter rule_id, from_processor, to_processor Failover events

Processor Metrics

Metric Type Labels Description
pulseroute_processor_health_status Gauge rule_id, processor_id 1=healthy, 0.5=degraded, 0=failed_over
pulseroute_processor_success_rate Gauge rule_id, processor_id Current success rate (0-1)
pulseroute_processor_latency_p95 Gauge rule_id, processor_id P95 latency in ms
pulseroute_processor_failure_probability Gauge rule_id, processor_id Tier 2 LSTM prediction (0-1)
pulseroute_bandit_weight Gauge rule_id, processor_id Tier 3 traffic allocation (0-1)

System Metrics

Metric Type Description
pulseroute_engine_tier Gauge Current tier level (1, 2, or 3)
pulseroute_redis_connected Gauge Redis connectivity (0 or 1)

Grafana Dashboard

A pre-built dashboard is auto-provisioned when you start the monitoring stack.

Setup

docker compose --profile monitoring up -d

Open Grafana at http://localhost:3001 (admin / pulseroute).

Navigate to Dashboards > PulseRoute or go directly to: http://localhost:3001/d/pulseroute-main/pulseroute

Dashboard Sections

System Overview — stat panels showing engine tier, Redis status, total transactions, failover count, and request rate.

Processor Health — per-processor status indicators (healthy/degraded/failed), success rate time series, and P95 latency.

Tier 2: Failure Prediction — LSTM failure probability over time with threshold lines (0.3 warning, 0.6 critical), and current probability bar gauges.

Tier 3: Multi-Armed Bandit — traffic allocation stacked area chart, current traffic split donut, and failover events bar chart.

HTTP Performance — request rate by endpoint, latency percentiles (p50/p95/p99), and requests by status code.

Webhook Alerts

PulseRoute fires webhooks for critical events. Configure endpoints via the API.

Register a Webhook

curl -X POST http://localhost:8080/v1/webhooks \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-service.com/pulseroute-webhook",
    "event_types": [
      "failover.triggered",
      "failover.recovered",
      "processor.degraded",
      "processor.healthy"
    ],
    "secret": "your-hmac-secret"
  }'

Event Types

Event Fired When
failover.triggered Primary processor fails, traffic moves to secondary
failover.recovered Primary recovers, traffic returns
processor.degraded Processor error rate exceeds degradation threshold
processor.healthy Processor returns to healthy status

Webhook Payload

{
  "event_id": "evt_abc123",
  "event_type": "failover.triggered",
  "timestamp": "2026-04-01T12:00:00",
  "data": {
    "rule_id": "us_cards",
    "from_processor": "stripe",
    "to_processor": "adyen",
    "reason": "Primary error rate 35.0% exceeds threshold 30.0%"
  }
}

HMAC Verification

If you provide a secret when registering, every delivery includes an X-PulseRoute-Signature header:

X-PulseRoute-Signature: sha256=<hex-digest>

Verify it in your handler:

import hashlib, hmac

def verify(body: bytes, secret: str, signature: str) -> bool:
    expected = hmac.new(secret.encode(), body, hashlib.sha256).hexdigest()
    return hmac.compare_digest(f"sha256={expected}", signature)
const crypto = require('crypto');

function verify(body, secret, signature) {
  const expected = crypto
    .createHmac('sha256', secret)
    .update(body)
    .digest('hex');
  return signature === `sha256=${expected}`;
}

Delivery Behavior

  • Retries: 3 attempts with exponential backoff (1s, 2s, 4s)
  • Timeout: 10 seconds per attempt
  • Parallel delivery: Multiple webhooks for the same event are delivered concurrently

Integration Examples

Slack (via incoming webhook):

Register PulseRoute to POST to your Slack webhook URL. Use a middleware service to transform the PulseRoute payload into Slack's block format.

PagerDuty:

Point webhook at PagerDuty's Events API v2 endpoint. Map failover.triggered to a PagerDuty alert.

Custom monitoring:

# List registered webhooks
curl http://localhost:8080/v1/webhooks

# Delete a webhook
curl -X DELETE http://localhost:8080/v1/webhooks/{webhook_id}

Prometheus Scrape Config

PulseRoute includes a pre-configured prometheus.yml that works out of the box with Docker Compose. To add custom scrape targets or alerting rules, edit the file and restart Prometheus.