openclaw - ✅(Solved) Fix [Bug]: Gateway heartbeat scheduler starts but never fires cycles after gateway restart (v2026.3.13) [1 pull requests, 4 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#52819Fetched 2026-04-08 01:18:50
View on GitHub
Comments
4
Participants
3
Timeline
12
Reactions
0
Author
Timeline (top)
commented ×4labeled ×2mentioned ×2subscribed ×2

OpenClaw Version: 2026.3.13 (61d171a) OS: Ubuntu 24.04 (Hetzner VPS) Node.js: 22.22.1 Gateway Mode: local, loopback bind

Description:

The gateway/heartbeat subsystem initializes correctly on gateway startup (heartbeat: started with intervalMs: 600000) but never fires any heartbeat cycles. No agent sessions are created, no heartbeat check-ins occur. The agent configured with a heartbeat schedule simply never runs.

Root Cause

This gateway also has Telegram channel configured (channels.telegram.enabled: true). The heartbeat.target is set to "last" — it's possible the scheduler skips firing because there's no "last" channel/session for the gateway agent after a fresh restart. If that's the case, a fallback behavior (fire anyway, skip delivery) would fix it.

Fix Action

Fix / Workaround

Workaround:

PR fix notes

PR #52841: fix(heartbeat): add diagnostic logging to heartbeat runner skip paths

Description (problem / solution / changelog)

Summary

Adds debug-level diagnostic logging to the heartbeat runner's silent skip paths, addressing the observability gap reported in #52819.

Problem: After a gateway restart, the heartbeat scheduler logs heartbeat: started with the correct intervalMs but never fires any cycles. The run() function can return "skipped" through multiple early-return paths with zero log output, making it impossible to diagnose whether the issue is:

  • Timer not firing (event loop / unref() edge case)
  • run() being called but returning early (stopped / disabled / no agents)
  • runOnce() returning "skipped" for a transient reason (quiet-hours, empty file, not-due)

Fix: Add log.debug() calls to every silent code path:

  • Early returns in run(): stopped, globally disabled, no agents configured
  • Agent-level skip: not-due-yet (with timing delta for jitter diagnosis)
  • runOnce skip results: agent ID + reason
  • scheduleNext(): timer arming with delay value
  • End-of-cycle: summary when no agents actually ran

All logging is at debug level — zero overhead in production unless debug logging is enabled.

How to verify

  1. Enable debug logging: set log level to debug in config or env
  2. Start the gateway with a heartbeat-configured agent
  3. Restart the gateway (systemctl restart or Docker restart)
  4. Check logs — every heartbeat cycle now traces through the full path:
    heartbeat: started { intervalMs: 600000 }
    heartbeat: scheduling next wake { delayMs: 600000 }
    heartbeat: agent not due yet { agentId: "...", deltaMs: 42 }
    heartbeat: cycle completed without running { reason: "not-due", agentCount: 1 }
  5. On the next cycle when the agent IS due, the skip logs disappear and normal execution proceeds

Test plan

  • All existing heartbeat-runner scheduler tests pass (6/6)
  • All existing heartbeat-wake tests pass (13/13)
  • Lint passes (biome, 0 warnings)
  • Manual test with debug logging enabled on a real gateway

Closes #52819

Changed files

  • .github/workflows/stale.yml (removed, +0/-51)
  • src/infra/heartbeat-runner.ts (modified, +24/-1)

Code Example

{
  "agents": {
    "list": [
      { "id": "main" },
      {
        "id": "mc-gateway-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
        "name": "My Gateway Agent",
        "workspace": "/root/.openclaw/workspace/workspace-gateway-xxx/",
        "agentDir": "/root/.openclaw/agents/mc-gateway-xxx/agent",
        "heartbeat": {
          "every": "10m",
          "includeReasoning": false,
          "target": "last"
        }
      }
    ]
  }
}

---

2026-03-23T10:21:29+00:00 [gateway/heartbeat] heartbeat: started {"intervalMs": 600000}

---

2026-03-23T10:22:29+00:00 cron: timer armed
2026-03-23T10:23:29+00:00 cron: timer armed
2026-03-23T10:24:29+00:00 cron: timer armed
...

---

**Log excerpt (full gateway log after restart):**


10:21:29 [INFO] [gateway/heartbeat] heartbeat: started {"intervalMs": 600000}
10:21:29 [INFO] [gateway/health-monitor] started (interval: 300s)
10:21:29 [INFO] [gateway] listening on ws://127.0.0.1:18789
10:21:29 [INFO] [telegram] starting provider
10:21:29 [DEBUG] cron: timer armed
10:22:29 [DEBUG] cron: timer armed
10:23:29 [DEBUG] cron: timer armed
10:24:29 [DEBUG] cron: timer armed
10:25:29 [DEBUG] cron: timer armed
... (continues with only cron entries, no heartbeat)

---

# Runs every 5 minutes, auto-discovers all gateway agents
curl -s -X POST "${BASE_URL}/api/v1/agent/heartbeat" \
  -H "X-Agent-Token: ${TOKEN_FROM_TOOLS_MD}"
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Summary

OpenClaw Version: 2026.3.13 (61d171a) OS: Ubuntu 24.04 (Hetzner VPS) Node.js: 22.22.1 Gateway Mode: local, loopback bind

Description:

The gateway/heartbeat subsystem initializes correctly on gateway startup (heartbeat: started with intervalMs: 600000) but never fires any heartbeat cycles. No agent sessions are created, no heartbeat check-ins occur. The agent configured with a heartbeat schedule simply never runs.

Steps to reproduce

  1. Configure a gateway agent with a heartbeat schedule in openclaw.json:
{
  "agents": {
    "list": [
      { "id": "main" },
      {
        "id": "mc-gateway-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
        "name": "My Gateway Agent",
        "workspace": "/root/.openclaw/workspace/workspace-gateway-xxx/",
        "agentDir": "/root/.openclaw/agents/mc-gateway-xxx/agent",
        "heartbeat": {
          "every": "10m",
          "includeReasoning": false,
          "target": "last"
        }
      }
    ]
  }
}
  1. Start the gateway: systemctl --user start openclaw-gateway
  2. Observe logs: journalctl --user -u openclaw-gateway -f
  3. Wait 15+ minutes

Key Evidence:

  • Before restart: Heartbeats WERE firing. The gateway process (PID 1756278, running since Mar 20) showed heartbeat-triggered agent activity every 10 minutes (exec tool calls at :01, :11, :21, :31, etc.). This confirms the heartbeat config IS valid and the agent CAN run.

  • After restart: Gateway process restarted (new PID). The heartbeat subsystem logs heartbeat: started but never fires. Restarted 4 times across 15 minutes — same behavior every time. Only heartbeat: started, never any execution.

  • Cron subsystem works fine in the same process — cron: timer armed fires every minute as expected. This rules out general timer/scheduler issues in the Node.js process.

  • Config is intact — The agents.list entry with heartbeat.every: "10m" is present and valid (verified via python3 -c "import json; ..."). The config didn't change between the working state (before restart) and the broken state (after restart).

Expected behavior

Heartbeat agent session fires every 10 minutes (per `heartbeat.every: "10m"

Actual behavior

Only the initialization log appears. No heartbeat sessions are ever created:

2026-03-23T10:21:29+00:00 [gateway/heartbeat] heartbeat: started {"intervalMs": 600000}

No further heartbeat-related log entries appear — ever. The cron subsystem (same gateway process) works correctly during this time:

2026-03-23T10:22:29+00:00 cron: timer armed
2026-03-23T10:23:29+00:00 cron: timer armed
2026-03-23T10:24:29+00:00 cron: timer armed
...

OpenClaw version

2026.3.13 (61d171a)

Operating system

Ubuntu 24.04

Install method

npm global

Model

openrouter/qwen/qwen3.5-122b-a10b

Provider / routing chain

openclaw -> openrouter -> qwen

Additional provider/model setup details

No response

Logs, screenshots, and evidence

**Log excerpt (full gateway log after restart):**


10:21:29 [INFO] [gateway/heartbeat] heartbeat: started {"intervalMs": 600000}
10:21:29 [INFO] [gateway/health-monitor] started (interval: 300s)
10:21:29 [INFO] [gateway] listening on ws://127.0.0.1:18789
10:21:29 [INFO] [telegram] starting provider
10:21:29 [DEBUG] cron: timer armed
10:22:29 [DEBUG] cron: timer armed
10:23:29 [DEBUG] cron: timer armed
10:24:29 [DEBUG] cron: timer armed
10:25:29 [DEBUG] cron: timer armed
... (continues with only cron entries, no heartbeat)

Impact and severity

No response

Additional information

Workaround:

We created a systemd timer that sends heartbeats externally via curl, reading the agent's auth token from TOOLS.md:

# Runs every 5 minutes, auto-discovers all gateway agents
curl -s -X POST "${BASE_URL}/api/v1/agent/heartbeat" \
  -H "X-Agent-Token: ${TOKEN_FROM_TOOLS_MD}"

This works but bypasses the agent's own liveness check (the agent could be hung but still appear online).

Additional Context:

This gateway also has Telegram channel configured (channels.telegram.enabled: true). The heartbeat.target is set to "last" — it's possible the scheduler skips firing because there's no "last" channel/session for the gateway agent after a fresh restart. If that's the case, a fallback behavior (fire anyway, skip delivery) would fix it.

extent analysis

Fix Plan

To resolve the issue with the heartbeat subsystem not firing, we need to modify the heartbeat scheduler to handle the case where there's no "last" channel/session after a restart.

Here are the steps:

  • Modify the gateway/heartbeat subsystem to include a fallback behavior when heartbeat.target is set to "last" and there's no last channel/session.
  • Update the scheduler to fire the heartbeat even if there's no "last" channel/session, and skip delivery if necessary.

Example code changes:

// In the heartbeat scheduler
if (heartbeatTarget === 'last' && !lastChannelSession) {
  // Fallback behavior: fire the heartbeat and skip delivery
  logger.info('Heartbeat fired with no last channel/session, skipping delivery');
  // Skip delivery logic here
} else {
  // Existing logic for handling heartbeat with a last channel/session
}

Verification

To verify the fix, follow these steps:

  • Restart the gateway and observe the logs for the heartbeat: started message.
  • Wait for the scheduled heartbeat interval (10 minutes in this case) and check the logs for the heartbeat firing.
  • Verify that the heartbeat fires even after a restart, and that the fallback behavior works correctly when there's no "last" channel/session.

Extra Tips

  • Consider adding additional logging to help diagnose issues with the heartbeat scheduler.
  • Review the systemd timer workaround and consider removing it once the fix is verified to work correctly.
  • Test the fix with different heartbeat.target values to ensure the fallback behavior works as expected.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Heartbeat agent session fires every 10 minutes (per `heartbeat.every: "10m"

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING