openclaw - 💡(How to fix) Fix [Bug]: Heartbeat death loop — pendingFinalDelivery stuck on agent main session, blocks all future heartbeats for days

openclaw2026-05-08 05:18:20

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

A heartbeat run that returns any non-token text to a session whose origin.to is the auto-reply pseudo-target "heartbeat" puts the agent's main session into a permanent pendingFinalDelivery: true state. Every subsequent heartbeat tick retries the dead delivery first — failing silently with pendingFinalDeliveryLastError: null despite climbing pendingFinalDeliveryAttemptCount — and bumps updatedAt = now, which keeps the 30-second skip-window check (runHeartbeatOnce, line 866-870 of heartbeat-runner-DpQCcYf2.js) perpetually true. Result: the heartbeat scheduler logs heartbeat: started intervalMs: 3600000 cleanly at every gateway boot, but no actual heartbeat run ever happens again. We observed 64 consecutive hours of silence (2026-05-05 17:54 → 2026-05-08 13:53 WIB) before investigating. cron list, doctor, and system heartbeat last all surface nothing about the stuck state.

Error Message

Behavior bug (silent state corruption, no crash, no error logs)

Bug B — silent retry against pseudo-target with no error captured

recordPendingFinalDeliveryFailure that captures the error string into 4. (Hardening) openclaw doctor should warn when any session has

Root Cause

Clearing only the pendingFinalDelivery* fields is insufficient — we verified within 8 minutes that the same heartbeat output re-creates the stuck state, because origin.to: "heartbeat" is still on the session and keeps re-dispatching.

Fix Action

Fix / Workaround

Beta release blocker

No (workaround exists)

The only thing that prevented the death loop reforming after the fresh session was that the new session has origin: null and lastTo: null (rather than origin.to: "heartbeat"), so there's nothing for dispatch to retry against. Pending stays cosmetically true but updatedAt doesn't get bumped, and the next 60m tick fires normally.

Bug B — silent retry against pseudo-target with no error captured

dispatch-8E8vi2HV.js:227-246 (clearPendingFinalDeliveryAfterSuccess) only clears the flag on success. There is no corresponding recordPendingFinalDeliveryFailure that captures the error string into pendingFinalDeliveryLastError — so failures look identical to "still trying" and never surface in logs.
When delivery.to === "heartbeat" (the auto-reply pseudo-channel set on the session origin) and no real channel adapter resolves, the dispatch path returns silently. Compare #78532 (closed 2026-05-07) which addressed a similar deliverySucceeded=true masquerade — this is the same family of telemetry-vs-state mismatch on the failure side.

The 30s skip window in runHeartbeatOnce:

if (recentSessionEntry?.pendingFinalDelivery === true
    && recentSessionEntry?.updatedAt
    && startedAt - recentSessionEntry.updatedAt < 3e4) return SKIP_REQUESTS_IN_FLIGHT;

is correct in principle, but combined with retry logic that bumps updatedAt = now on each silent failure, it becomes a perpetual block.

Code Example

python3 -c "import json; e=json.load(open('/home/openclaw/.openclaw/agents/<agent>/sessions/sessions.json'))['agent:<agent>:main']; print({k:e.get(k) for k in ['pendingFinalDelivery','pendingFinalDeliveryAttemptCount','pendingFinalDeliveryLastError','updatedAt']})"

---

if (recentSessionEntry?.pendingFinalDelivery === true
      && recentSessionEntry?.updatedAt
      && startedAt - recentSessionEntry.updatedAt < 3e4) return SKIP_REQUESTS_IN_FLIGHT;

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (silent state corruption, no crash, no error logs)

Beta release blocker

No (workaround exists)

Summary

Reproduction (deterministic on v2026.5.7)

Configure an agent with heartbeat: { every: "60m" } and target unset (defaults to "none"). Agent has no real Telegram/Discord delivery target for its heartbeat.
Let the agent's heartbeat fire normally a few times. The session ~/.openclaw/agents/<agent>/sessions/sessions.json for agent:<id>:main accumulates origin: { label: "heartbeat", from: "heartbeat", to: "heartbeat" }.
Force the heartbeat to return any text other than the bare token. In our case the agent appended a preamble: "All clear.\n\nHEARTBEAT_OK" instead of bare HEARTBEAT_OK. (See "Even bare HEARTBEAT_OK still triggers pending" below — non-token preamble accelerates it but isn't required.)
The session enters pendingFinalDelivery: true with the preamble text.

Wait one heartbeat interval. Inspect:

python3 -c "import json; e=json.load(open('/home/openclaw/.openclaw/agents/<agent>/sessions/sessions.json'))['agent:<agent>:main']; print({k:e.get(k) for k in ['pendingFinalDelivery','pendingFinalDeliveryAttemptCount','pendingFinalDeliveryLastError','updatedAt']})"

pendingFinalDeliveryAttemptCount climbs by 1 every hour, but pendingFinalDeliveryLastError stays null. updatedAt matches each retry timestamp.

In our reproduction we hit attemptCount: 64 before noticing.

Even bare HEARTBEAT_OK still triggers pending

After tightening the agent's HEARTBEAT.md to forbid preamble and forcing a fresh session run (manual openclaw system event --mode now --text "test" --url ws://127.0.0.1:18789 --token $OPENCLAW_GATEWAY_TOKEN on a freshly-created session entry), the agent returned the literal token HEARTBEAT_OK and still ended with pendingFinalDelivery: true, pendingFinalDeliveryText: "HEARTBEAT_OK". So the stripHeartbeatToken call in heartbeat-Dynyl6hI.js:52-87 either runs after the pending-queue write or its empty-after-strip output isn't gating the queueing. The runner should treat "output that strips to empty" as effectively-empty and skip the final-delivery queue entirely.

Two distinct issues compounding

Bug A — pending-delivery flag set even when output is the bare token

agent-runner.runtime-DQsCsHUA.js:4093-4095 sets pendingFinalDelivery: true
- pendingFinalDeliveryText: pendingText whenever pendingText is non-empty by the runner's metric.
For heartbeat sessions, "output that strips to the empty string" should count as effectively-empty. Currently pendingText = "HEARTBEAT_OK" reaches that block.

Bug B — silent retry against pseudo-target with no error captured

dispatch-8E8vi2HV.js:227-246 (clearPendingFinalDeliveryAfterSuccess) only clears the flag on success. There is no corresponding recordPendingFinalDeliveryFailure that captures the error string into pendingFinalDeliveryLastError — so failures look identical to "still trying" and never surface in logs.
When delivery.to === "heartbeat" (the auto-reply pseudo-channel set on the session origin) and no real channel adapter resolves, the dispatch path returns silently. Compare #78532 (closed 2026-05-07) which addressed a similar deliverySucceeded=true masquerade — this is the same family of telemetry-vs-state mismatch on the failure side.

The 30s skip window in runHeartbeatOnce:

if (recentSessionEntry?.pendingFinalDelivery === true
    && recentSessionEntry?.updatedAt
    && startedAt - recentSessionEntry.updatedAt < 3e4) return SKIP_REQUESTS_IN_FLIGHT;

is correct in principle, but combined with retry logic that bumps updatedAt = now on each silent failure, it becomes a perpetual block.

Workaround (proven on v2026.5.7)

Stop gateway → drop the entire agent:<agent>:main entry from sessions.json and remove its associated *.jsonl/*.trajectory.jsonl files → restart gateway. The runner re-creates the session on the next tick with origin: null, breaking the dispatch retry loop.

Suggested fixes

(Bug A) In the agent-runner pending-delivery write, gate on isHeartbeatContentEffectivelyEmpty(stripHeartbeatToken(text).text). If stripped output is empty, skip the pending queue write entirely.
(Bug B-1) Capture dispatch failures into pendingFinalDeliveryLastError so silent failures become visible.
(Bug B-2) When delivery.to === "heartbeat" and no channel plugin resolves, treat as clearPendingFinalDeliveryAfterSuccess — the pseudo-target acknowledges by reaching it; persistent retry is the bug.
(Hardening) openclaw doctor should warn when any session has pendingFinalDelivery: true AND now - pendingFinalDeliveryCreatedAt > 1h AND pendingFinalDeliveryLastError === null. That's the diagnostic triple that masks this bug.

Environment

OpenClaw: 2026.5.7 (eeef486)
OS: Ubuntu 24.04 / Linux 6.8.0-110-generic x86_64
Node: v22.22.2
Install: npm global as system-level systemd service
Topology: 5-agent (orchestrator + reasoner + coder + fast + multimodal)
Affected agent: fast — 60m heartbeat, no target set, runs deepseek-v4-flash via opencode-go provider
Channel: Telegram-bound orchestrator; the affected agent has no direct user-facing channel.

Related issues (not duplicates)

#59710 — Heartbeat silently stops after ~20h (this issue's diagnostic mechanism may be the underlying cause)
#78187 — Heartbeat polling silently stops after SIGUSR1 gateway restart
#74257 — HEARTBEAT_OK/internal text leak (inverse symptom of same path)
#78532 — deliverySucceeded=true when no adapter invoked (CLOSED, sibling)
#55882 — Agent can drop promised outputs after task switching (broader pending-deliverables queue durability)
#65498 — Main-session user task can lose final reply after heartbeat or exec-completion interrupt (CLOSED — related fix area)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#batch processing #GPU compatibility #latency issue #model loading #dependency error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Bug]: Heartbeat death loop — pendingFinalDelivery stuck on agent main session, blocks all future heartbeats for days

Recommended Tools

GitHub issue graph ai analysis

Error Message

Bug B — silent retry against pseudo-target with no error captured

Root Cause

Fix Action

Fix / Workaround

Beta release blocker

Bug B — silent retry against pseudo-target with no error captured

Code Example

Bug type

Beta release blocker

Summary

Reproduction (deterministic on v2026.5.7)

Even bare HEARTBEAT_OK still triggers pending

Two distinct issues compounding

Bug A — pending-delivery flag set even when output is the bare token

Bug B — silent retry against pseudo-target with no error captured

Workaround (proven on v2026.5.7)

Suggested fixes

Environment

Related issues (not duplicates)

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Heartbeat death loop — pendingFinalDelivery stuck on agent main session, blocks all future heartbeats for days

Recommended Tools

GitHub issue graph ai analysis

Error Message

Bug B — silent retry against pseudo-target with no error captured

Root Cause

Fix Action

Fix / Workaround

Beta release blocker

Bug B — silent retry against pseudo-target with no error captured

Code Example

Bug type

Beta release blocker

Summary

Reproduction (deterministic on v2026.5.7)

Even bare HEARTBEAT_OK still triggers pending

Two distinct issues compounding

Bug A — pending-delivery flag set even when output is the bare token

Bug B — silent retry against pseudo-target with no error captured

Workaround (proven on v2026.5.7)

Suggested fixes

Environment

Related issues (not duplicates)

Still need to ship something?

RELATED_DISCOVERY

TRENDING