openclaw - 💡(How to fix) Fix Discussion: Pulse — Central Health Dashboard [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#52704Fetched 2026-04-08 01:20:04
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Error Message

  • openclaw_agent_memory_bytes > 0.9 * budget → warn

Code Example

version: '3'
services:
  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prom_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana
    volumes:
      - grafana_data:/var/lib/grafana
    ports:
      - "3000:3000"

  node-exporter:
    image: prom/node-exporter
    pid: host
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'

volumes:
  prom_data:
  grafana_data:
RAW_BUFFERClick to expand / collapse

Discussion: Pulse — Central Health Dashboard

Repository: New (pulse) or part of OpenClaw core Issue Type: Discussion Priority: P2 — Observability stack Related: OpenClaw #41924 (agent health monitoring), optimization-brainstorm-v30

Goal

Provide single-pane-of-glass view into health of all OpenClaw components:

  • Gateway process (CPU, memory, uptime)
  • Agent memory usage, token budget, active tasks
  • Cron job status (last run, success/failure)
  • Disk space, database sizes
  • External service connectivity (GitHub, RSS feeds, APIs)

Architecture

Data Collection

Agent-side (OpenClaw core):

  • Expose /metrics endpoint (Prometheus text format)
  • Metrics:
    • openclaw_agent_memory_bytes{type="workspace"}
    • openclaw_agent_tokens_total, openclaw_agent_tokens_used
    • openclaw_tasks_active, openclaw_tasks_completed_total
    • openclaw_cron_jobs{failed="bool"}
    • openclaw_gateway_up{status="bool"}

System-side (node-exporter already gives CPU/mem/disk; we just scrape)

Push vs Pull: Use Prometheus pull from /metrics endpoint. Simpler.

Storage & Visualization

  • Prometheus: scrape /metrics every 15s, store TSDB
  • Grafana: pre-built dashboard:
    • Agent memory gauge, token usage sparkline
    • Task queue depth
    • Up/down status for gateway, cron
    • Alerts table

Alerting

  • Rules (PrometheusAlertmanager):
    • openclaw_agent_memory_bytes > 0.9 * budget → warn
    • openclaw_tasks_active > 10 → info (busy)
    • openclaw_cron_jobs{failed="true"} == 1 → critical
    • up{job="openclaw_gateway"} == 0 → critical
  • Notifications: Discord webhook (since we already have Discord integration)

Deployment

Quick Start (Docker Compose)

version: '3'
services:
  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prom_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana
    volumes:
      - grafana_data:/var/lib/grafana
    ports:
      - "3000:3000"

  node-exporter:
    image: prom/node-exporter
    pid: host
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'

volumes:
  prom_data:
  grafana_data:

Add OpenClaw agent to Prometheus scrape config.

Dashboard Panels

  • Top row: Status summary (Gateway up? Agent healthy? Cron failed?)
  • Middle:
    • Memory usage (workspace size vs budget)
    • Token budget utilization (percentage)
    • Active tasks count
    • Queue depth
  • Bottom:
    • Cron job history (last 24h) with success/fail markers
    • External API latency (GitHub, RSS feed responses)
    • Disk usage (memory/, price_data/, etc.)

Alternatives

  • Netdata: easier setup, less flexible; good for personal use. Could use instead of Prom+Grafana.
  • Simple HTML + cron: Too basic; no alerting.
  • Sentry: For exceptions only, not system metrics.

Decision

Given we already have node-exporter likely running, Prometheus + Grafana is natural. If too heavy, switch to Netdata later.

Estimated setup: 2 hours (including adding /metrics to OpenClaw agent).

Open Questions

  • Should we include QuantPipe build size, DB size? Yes, as separate panels.
  • Should we expose internal agent errors as metrics? Yes, counter openclaw_agent_errors_total{type="..."}.
  • How to secure /metrics endpoint? Restrict to localhost; Prometheus scrapes via localhost. If remote, add basic auth.

Next Steps

  1. Add /metrics endpoint to OpenClaw agent (if not present)
  2. Define initial metric set (as above)
  3. Create docker-compose.yml in infra/pulse/
  4. Build Grafana dashboard (export as JSON)
  5. Configure Alertmanager → Discord webhook
  6. Document deployment steps

Let's implement after ToolResultCompactor to manage memory budget automatically.

extent analysis

Fix Plan

To implement the Pulse — Central Health Dashboard, follow these steps:

  1. Add /metrics endpoint to OpenClaw agent:

    • Expose the endpoint to return metrics in Prometheus text format.
    • Include metrics such as openclaw_agent_memory_bytes, openclaw_agent_tokens_total, openclaw_tasks_active, etc.
  2. Define initial metric set:

    • Create a list of required metrics, including openclaw_agent_memory_bytes, openclaw_agent_tokens_total, openclaw_tasks_active, etc.
    • Use these metrics to monitor the health of OpenClaw components.
  3. Create docker-compose.yml:

    • Define services for Prometheus, Grafana, and node-exporter.
    • Configure volumes and ports for each service.

Example docker-compose.yml:

version: '3'
services:
  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prom_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana
    volumes:
      - grafana_data:/var/lib/grafana
    ports:
      - "3000:3000"

  node-exporter:
    image: prom/node-exporter
    pid: host
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'

volumes:
  prom_data:
  grafana_data:
  1. Build Grafana dashboard:

    • Create a new dashboard with panels for metrics such as agent memory usage, token budget utilization, active tasks count, etc.
    • Export the dashboard as JSON.
  2. Configure Alertmanager:

    • Define alerting rules for metrics such as openclaw_agent_memory_bytes, openclaw_tasks_active, etc.
    • Configure notifications to send alerts to a Discord webhook.

Example alerting rule:

groups:
- name: openclaw_alerts
  rules:
  - alert: OpenClawAgentMemoryHigh
    expr: openclaw_agent_memory_bytes > 0.9 * budget
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: OpenClaw agent memory usage is high

Verification

To verify that the fix worked:

  1. Check Prometheus metrics:
    • Access the Prometheus

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING