openclaw - 💡(How to fix) Fix Fleet/dashboard: alerts not auto-resolved on recovery, causing contradictory online status + active offline alert [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#51034Fetched 2026-04-08 01:05:16
View on GitHub
Comments
1
Participants
2
Timeline
1
Reactions
0
Participants
Timeline (top)
commented ×1

Root Cause

Alerts are created on state-change (gateway goes offline → alert raised) but not cleared on recovery (gateway comes back online → alert stays active). Status and alerts are sourced from different signals with no reconciliation.

RAW_BUFFERClick to expand / collapse

Problem

The fleet/dashboard view shows contradictory state:

  • Status: Online (current heartbeat shows gateway is alive)
  • Alerts: "Gateway Offline" (active alert, never cleared)

This is a zombie alert — raised correctly when the gateway went offline, but never auto-resolved when it came back up.

Root Cause

Alerts are created on state-change (gateway goes offline → alert raised) but not cleared on recovery (gateway comes back online → alert stays active). Status and alerts are sourced from different signals with no reconciliation.

Expected Behavior

When a gateway comes back online, the corresponding "Gateway Offline" alert should:

  1. Auto-resolve and move to a Recovered / Recent Incidents section, or
  2. Update in-place with timestamps: Gateway Offline — triggered 08:11, resolved 08:14

An active alert should only appear if the condition is currently true.

What it should NOT do

Show Status: Online and Alert: Gateway Offline (active) simultaneously — that's contradictory and confusing.

Related

  • #51028 (session ordering / activity clarity)
  • #51030 (human-readable sub-agent labels)

extent analysis

Fix Plan

To resolve the zombie alert issue, we need to implement alert auto-resolution when the gateway comes back online.

Step-by-Step Solution:

  1. Modify the alert creation logic: Update the alert system to listen for both offline and online events.
  2. Add a recovery condition: When a gateway comes back online, check if there's an active "Gateway Offline" alert and auto-resolve it.
  3. Update the alert status: Change the alert status to "Recovered" and add timestamps for when the alert was triggered and resolved.

Example Code (Python):

def create_alert(gateway_status):
    if gateway_status == "offline":
        # Create "Gateway Offline" alert
        alert = {"status": "active", "type": "Gateway Offline", "triggered_at": datetime.now()}
        alerts.append(alert)
    elif gateway_status == "online":
        # Auto-resolve "Gateway Offline" alert
        for alert in alerts:
            if alert["type"] == "Gateway Offline" and alert["status"] == "active":
                alert["status"] = "recovered"
                alert["resolved_at"] = datetime.now()
                break

def update_dashboard(gateway_status, alerts):
    # Update dashboard with current gateway status and alerts
    if gateway_status == "online":
        online_alerts = [alert for alert in alerts if alert["status"] == "active"]
        recovered_alerts = [alert for alert in alerts if alert["status"] == "recovered"]
        # Display online status and active/recovered alerts
    else:
        # Display offline status and active alerts
        pass

Verification

To verify the fix, follow these steps:

  • Trigger a "Gateway Offline" alert by taking a gateway offline.
  • Verify the alert is displayed on the dashboard.
  • Bring the gateway back online and verify the alert is auto-resolved and moved to the "Recovered / Recent Incidents" section or updated in-place with timestamps.

Extra Tips

  • Ensure the alert system is properly handling concurrent updates to prevent race conditions.
  • Consider implementing a timeout or retry mechanism to handle cases where the gateway takes longer than expected to come back online.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Fleet/dashboard: alerts not auto-resolved on recovery, causing contradictory online status + active offline alert [1 comments, 2 participants]