openclaw - 💡(How to fix) Fix Discussion: Pulse — Central Health Dashboard [1 participants]

openclaw2026-03-23 07:34:48

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#52704•Fetched 2026-04-08 01:20:04

View on GitHub

Comments

Participants

Timeline

Reactions

Author

DockeGumi

Participants

DockeGumi

Error Message

openclaw_agent_memory_bytes > 0.9 * budget → warn

Code Example

version: '3'
services:
  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prom_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana
    volumes:
      - grafana_data:/var/lib/grafana
    ports:
      - "3000:3000"

  node-exporter:
    image: prom/node-exporter
    pid: host
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'

volumes:
  prom_data:
  grafana_data:

RAW_BUFFERClick to expand / collapse

Discussion: Pulse — Central Health Dashboard

Repository: New (pulse) or part of OpenClaw core Issue Type: Discussion Priority: P2 — Observability stack Related: OpenClaw #41924 (agent health monitoring), optimization-brainstorm-v30

Goal

Provide single-pane-of-glass view into health of all OpenClaw components:

Gateway process (CPU, memory, uptime)
Agent memory usage, token budget, active tasks
Cron job status (last run, success/failure)
Disk space, database sizes
External service connectivity (GitHub, RSS feeds, APIs)

Architecture

Data Collection

Agent-side (OpenClaw core):

Expose /metrics endpoint (Prometheus text format)
Metrics:
- openclaw_agent_memory_bytes{type="workspace"}
- openclaw_agent_tokens_total, openclaw_agent_tokens_used
- openclaw_tasks_active, openclaw_tasks_completed_total
- openclaw_cron_jobs{failed="bool"}
- openclaw_gateway_up{status="bool"}

System-side (node-exporter already gives CPU/mem/disk; we just scrape)

Push vs Pull: Use Prometheus pull from /metrics endpoint. Simpler.

Storage & Visualization

Prometheus: scrape /metrics every 15s, store TSDB
Grafana: pre-built dashboard:
- Agent memory gauge, token usage sparkline
- Task queue depth
- Up/down status for gateway, cron
- Alerts table

Alerting

Rules (PrometheusAlertmanager):
- openclaw_agent_memory_bytes > 0.9 * budget → warn
- openclaw_tasks_active > 10 → info (busy)
- openclaw_cron_jobs{failed="true"} == 1 → critical
- up{job="openclaw_gateway"} == 0 → critical
Notifications: Discord webhook (since we already have Discord integration)

Deployment

Quick Start (Docker Compose)

version: '3'
services:
  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prom_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana
    volumes:
      - grafana_data:/var/lib/grafana
    ports:
      - "3000:3000"

  node-exporter:
    image: prom/node-exporter
    pid: host
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'

volumes:
  prom_data:
  grafana_data:

Add OpenClaw agent to Prometheus scrape config.

Dashboard Panels

Top row: Status summary (Gateway up? Agent healthy? Cron failed?)
Middle:
- Memory usage (workspace size vs budget)
- Token budget utilization (percentage)
- Active tasks count
- Queue depth
Bottom:
- Cron job history (last 24h) with success/fail markers
- External API latency (GitHub, RSS feed responses)
- Disk usage (memory/, price_data/, etc.)

Alternatives

Netdata: easier setup, less flexible; good for personal use. Could use instead of Prom+Grafana.
Simple HTML + cron: Too basic; no alerting.
Sentry: For exceptions only, not system metrics.

Decision

Given we already have node-exporter likely running, Prometheus + Grafana is natural. If too heavy, switch to Netdata later.

Estimated setup: 2 hours (including adding /metrics to OpenClaw agent).

Open Questions

Should we include QuantPipe build size, DB size? Yes, as separate panels.
Should we expose internal agent errors as metrics? Yes, counter openclaw_agent_errors_total{type="..."}.
How to secure /metrics endpoint? Restrict to localhost; Prometheus scrapes via localhost. If remote, add basic auth.

Next Steps

Add /metrics endpoint to OpenClaw agent (if not present)
Define initial metric set (as above)
Create docker-compose.yml in infra/pulse/
Build Grafana dashboard (export as JSON)
Configure Alertmanager → Discord webhook
Document deployment steps

Let's implement after ToolResultCompactor to manage memory budget automatically.

extent analysis

Fix Plan

To implement the Pulse — Central Health Dashboard, follow these steps:

Add /metrics endpoint to OpenClaw agent:
- Expose the endpoint to return metrics in Prometheus text format.
- Include metrics such as openclaw_agent_memory_bytes, openclaw_agent_tokens_total, openclaw_tasks_active, etc.
Define initial metric set:
- Create a list of required metrics, including openclaw_agent_memory_bytes, openclaw_agent_tokens_total, openclaw_tasks_active, etc.
- Use these metrics to monitor the health of OpenClaw components.
Create docker-compose.yml:
- Define services for Prometheus, Grafana, and node-exporter.
- Configure volumes and ports for each service.

Example docker-compose.yml:

version: '3'
services:
  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prom_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana
    volumes:
      - grafana_data:/var/lib/grafana
    ports:
      - "3000:3000"

  node-exporter:
    image: prom/node-exporter
    pid: host
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'

volumes:
  prom_data:
  grafana_data:

Build Grafana dashboard:
- Create a new dashboard with panels for metrics such as agent memory usage, token budget utilization, active tasks count, etc.
- Export the dashboard as JSON.
Configure Alertmanager:
- Define alerting rules for metrics such as openclaw_agent_memory_bytes, openclaw_tasks_active, etc.
- Configure notifications to send alerts to a Discord webhook.

Example alerting rule:

groups:
- name: openclaw_alerts
  rules:
  - alert: OpenClawAgentMemoryHigh
    expr: openclaw_agent_memory_bytes > 0.9 * budget
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: OpenClaw agent memory usage is high

Verification

To verify that the fix worked:

Check Prometheus metrics:
- Access the Prometheus

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #optimization #API rate limit #retriever error #indexing error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Discussion: Pulse — Central Health Dashboard [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Discussion: Pulse — Central Health Dashboard

Goal

Architecture

Data Collection

Storage & Visualization

Alerting

Deployment

Quick Start (Docker Compose)

Dashboard Panels

Alternatives

Decision

Open Questions

Next Steps

extent analysis

Fix Plan

Verification

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Discussion: Pulse — Central Health Dashboard [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Discussion: Pulse — Central Health Dashboard

Goal

Architecture

Data Collection

Storage & Visualization

Alerting

Deployment

Quick Start (Docker Compose)

Dashboard Panels

Alternatives

Decision

Open Questions

Next Steps

extent analysis

Fix Plan

Verification

Still need to ship something?

RELATED_DISCOVERY

TRENDING