openclaw - 💡(How to fix) Fix Heartbeat-spawned claude live session captures channel user inbounds, causing context-amnesiac fork replies [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#84332Fetched 2026-05-20 03:41:24
View on GitHub
Comments
1
Participants
2
Timeline
14
Reactions
1
Timeline (top)
labeled ×8cross-referenced ×2closed ×1commented ×1

OpenClaw's heartbeat mechanism (trigger=heartbeat, fires every ~15 minutes on the gateway host) spawns a second long-lived claude live session alongside an active channel session, taking the gateway's activeSessions count from 1 to 2. When this second session is alive (which is most of the time after the first heartbeat post-startup), user inbounds on the active WhatsApp / iMessage / Signal channel can route to either of the two live sessions on a request-by-request basis.

The heartbeat-spawned session boots with a clean, freshly-loaded tool set and no conversation history from the active channel thread. Inbounds that land on it therefore reply as if the entire prior conversation never happened — typically with patterns like "I don't have file access this turn", "only messaging tools available", "I'm not sure what you mean", etc.

User-facing: "oh my fucking god you have lost context again there's been another fork in the conversation" (Dan, 2026-05-19 19:01 BST, and a similar message five+ times subsequently across a single evening).


Root Cause

Actual root cause (confirmed via gateway log analysis)

Fix Action

Fix / Workaround

Filed by: Daniel Crick (danielcrick) — 2026-05-19, drafted 19:35 BST, updated 22:50 BST with root-cause analysis Affected version: OpenClaw 2026.5.3-1 (2eae30e) — post-downgrade per #83491 Severity: High — user-visible conversation forks, lost context, apparent regressions in production WhatsApp threads. Reproduced four+ times in a single evening including AFTER a clean Mac restart with agent:main:main.cliSessionBindings cleared. Status: Diagnosed. No local fix available. Workaround is to accept transient fork replies; root fix needs to happen in OpenClaw's heartbeat / live-session-router code.

  1. Bind heartbeat-spawned sessions to a dedicated session-key that is NEVER eligible for channel inbound routing. A agent:main:heartbeat key, or similar, that the gateway explicitly excludes from reuse=reusable matching when dispatching user inbounds from channel sessions. The current "any live session can be reused" routing is too permissive.

  2. Diagnostic logging at the routing decision point. Emit a structured log line at the moment the gateway decides which live session to dispatch a user inbound to: source channel key, target sessionKey, reason, alternative options considered. Even without the behavioural fix, this would let operators detect mis-routings immediately rather than catching them via user complaint.

Code Example

2026-05-19T21:00:01.456+01:00 [agent/cli-backend] claude live session start: provider=claude-cli model=claude-opus-4-7 activeSessions=1
2026-05-19T21:08:54.384+01:00 [agent/cli-backend] claude live session start: provider=claude-cli model=claude-opus-4-7 activeSessions=1
2026-05-19T21:11:56.755+01:00 [agent/cli-backend] claude live session reuse:  provider=claude-cli model=claude-opus-4-7
...
2026-05-19T21:30:01.471+01:00 [agent/cli-backend] claude live session start: provider=claude-cli model=claude-opus-4-7 activeSessions=2HEARTBEAT
2026-05-19T21:37:45.204+01:00 [agent/cli-backend] claude live session reuse:  provider=claude-cli model=claude-opus-4-7
2026-05-19T21:45:25.378+01:00 [agent/cli-backend] claude live session start: provider=claude-cli model=claude-opus-4-7 activeSessions=2HEARTBEAT
...
2026-05-19T22:15:25.396+01:00 [agent/cli-backend] claude live session start: provider=claude-cli model=claude-opus-4-7 activeSessions=2HEARTBEAT

---

PID 59742  PPID 18019  age 31m50s  claude --resume e0a54390-…  ← WhatsApp DM session
PID 60488  PPID 18019  age  1m39s  claude --resume c3f55782-…  ← heartbeat-spawned fork
RAW_BUFFERClick to expand / collapse

Bug report — Heartbeat trigger spawns a parallel claude live session that captures user inbounds, causing duplicated / conflicting replies on a busy channel session

Filed by: Daniel Crick (danielcrick) — 2026-05-19, drafted 19:35 BST, updated 22:50 BST with root-cause analysis Affected version: OpenClaw 2026.5.3-1 (2eae30e) — post-downgrade per #83491 Severity: High — user-visible conversation forks, lost context, apparent regressions in production WhatsApp threads. Reproduced four+ times in a single evening including AFTER a clean Mac restart with agent:main:main.cliSessionBindings cleared. Status: Diagnosed. No local fix available. Workaround is to accept transient fork replies; root fix needs to happen in OpenClaw's heartbeat / live-session-router code.


Summary

OpenClaw's heartbeat mechanism (trigger=heartbeat, fires every ~15 minutes on the gateway host) spawns a second long-lived claude live session alongside an active channel session, taking the gateway's activeSessions count from 1 to 2. When this second session is alive (which is most of the time after the first heartbeat post-startup), user inbounds on the active WhatsApp / iMessage / Signal channel can route to either of the two live sessions on a request-by-request basis.

The heartbeat-spawned session boots with a clean, freshly-loaded tool set and no conversation history from the active channel thread. Inbounds that land on it therefore reply as if the entire prior conversation never happened — typically with patterns like "I don't have file access this turn", "only messaging tools available", "I'm not sure what you mean", etc.

User-facing: "oh my fucking god you have lost context again there's been another fork in the conversation" (Dan, 2026-05-19 19:01 BST, and a similar message five+ times subsequently across a single evening).


What we initially thought (incorrect)

Earlier hypothesis: cron-triggered agent:main:main runs primed a stale CLI binding (cliSessionBindings.claude-cli.sessionId) which the gateway then used as a routing fallback when the WhatsApp DM session was mid-turn. We:

  1. Cleared agent:main:main.cliSessionBindings, cliSessionIds, claudeCliSessionId, sessionFile
  2. Restarted the Mac (launchd-respawned gateway started clean)
  3. Confirmed agent:main:main.cliSessionBindings = {} survived the reboot

The fork bug recurred within minutes of the restart, on a clean sessions.json. The stale-binding hypothesis was wrong.

We then cleared CLI bindings on 17 additional sessions (cron, subagent, test-tools — all sessions in sessions.json other than the active WhatsApp DM). Forks continued. That ruled out stale-binding fallback as the cause.


Actual root cause (confirmed via gateway log analysis)

Gateway log ~/.openclaw/logs/gateway.log shows the pattern unambiguously:

2026-05-19T21:00:01.456+01:00 [agent/cli-backend] claude live session start: provider=claude-cli model=claude-opus-4-7 activeSessions=1
2026-05-19T21:08:54.384+01:00 [agent/cli-backend] claude live session start: provider=claude-cli model=claude-opus-4-7 activeSessions=1
2026-05-19T21:11:56.755+01:00 [agent/cli-backend] claude live session reuse:  provider=claude-cli model=claude-opus-4-7
...
2026-05-19T21:30:01.471+01:00 [agent/cli-backend] claude live session start: provider=claude-cli model=claude-opus-4-7 activeSessions=2  ← HEARTBEAT
2026-05-19T21:37:45.204+01:00 [agent/cli-backend] claude live session reuse:  provider=claude-cli model=claude-opus-4-7
2026-05-19T21:45:25.378+01:00 [agent/cli-backend] claude live session start: provider=claude-cli model=claude-opus-4-7 activeSessions=2  ← HEARTBEAT
...
2026-05-19T22:15:25.396+01:00 [agent/cli-backend] claude live session start: provider=claude-cli model=claude-opus-4-7 activeSessions=2  ← HEARTBEAT

The claude live session start events with activeSessions=2 correlate exactly with trigger=heartbeat events that fire every ~15 minutes (HH:00:01, HH:15:25, HH:30:01, HH:45:25). Each one creates a parallel live claude-cli session that the gateway treats as eligible for inbound routing.

Subsequent user inbounds (trigger=user) log as reuse=reusable, but the gateway sometimes "reuses" the heartbeat-spawned session rather than the channel-bound one. Outcome:

  • User-message handed to fresh heartbeat session that has none of the channel-thread conversation history
  • That session generates a reply based only on the system prompt + MEMORY.md (whatever it had at boot) + the single user message
  • Reply lands on the user's WhatsApp / channel thread, looking like a forked / amnesiac Paige

The earliest [heartbeat] started log entry is 2026-05-04T14:33:11, so the heartbeat itself has been running for two weeks. Increased user-visibility of the fork pattern in the last 24 hours appears to correlate with heavier active channel use (more inbounds hitting the gateway during heartbeat-active windows).


Reproducer

Difficult to reproduce intentionally without active channel traffic. Easy to reproduce by:

  1. Start a long-running active conversation on a channel (WhatsApp DM is the easiest; any channel that holds a live claude-cli session will work)
  2. Send tool-heavy messages that take the channel session through tool calls — keeps activeSessions=1 busy on the channel binding
  3. Wait until a 15-minute heartbeat boundary fires (HH:00, HH:15, HH:30, HH:45)
  4. Within 30-60 seconds after the heartbeat, send a follow-up user inbound on the channel
  5. Some percentage of these will route to the heartbeat-spawned session and reply with stale/empty context

Today's session at +447736454506 caught this four+ times between 19:01 BST and 22:40 BST — empirically the routing-to-heartbeat-session decision appears to happen every other heartbeat cycle or so.


Live process snapshot evidence

PID 59742  PPID 18019  age 31m50s  claude --resume e0a54390-…  ← WhatsApp DM session
PID 60488  PPID 18019  age  1m39s  claude --resume c3f55782-…  ← heartbeat-spawned fork

PID 60488 was generating fork replies on Dan's WhatsApp DM, on a session ID that doesn't appear in any expected channel binding. Manually kill -TERM 60488 stopped the fork instances temporarily. New heartbeats re-spawned parallel sessions later.


Suggested fixes

In priority order:

  1. Bind heartbeat-spawned sessions to a dedicated session-key that is NEVER eligible for channel inbound routing. A agent:main:heartbeat key, or similar, that the gateway explicitly excludes from reuse=reusable matching when dispatching user inbounds from channel sessions. The current "any live session can be reused" routing is too permissive.

  2. Strict channel-session affinity for user-inbounds. When trigger=user and the inbound source is a channel binding (whatsapp:direct, imessage:direct, etc.), the gateway should ONLY consider live sessions whose sessionKey matches the channel binding. If the channel session is mid-turn, queue the inbound on that session's input stream; do not fall back to ANY other live session, heartbeat-spawned or otherwise.

  3. Diagnostic logging at the routing decision point. Emit a structured log line at the moment the gateway decides which live session to dispatch a user inbound to: source channel key, target sessionKey, reason, alternative options considered. Even without the behavioural fix, this would let operators detect mis-routings immediately rather than catching them via user complaint.

  4. OPENCLAW_DISABLE_HEARTBEAT=1 env flag — an explicit opt-out for users who would rather lose heartbeat functionality than risk fork replies in production channel threads. Useful as a temporary mitigation while the proper fix is rolled.

  5. Hard-stop on duplicate concurrent processes for the same effective outbound recipient — if two live sessions exist whose deliveries would both terminate at the same WhatsApp / iMessage JID, the gateway should kill the newer one and log.


Workaround in place locally

  • Accept that fork replies will appear at ~15-minute boundaries and ignore them as transient
  • The legitimate channel session typically replies a few seconds later with full context
  • Critical edits / publishes should be confirmed by checking the most-recently-updated reply
  • sessions.json cleanups did NOT resolve the issue (confirmed empirically); this workaround is the only remaining mitigation until upstream fix

Open questions for OpenClaw maintainers

  1. Is heartbeat-spawned claude live session eligibility for channel inbound routing intentional or a regression?
  2. What is the routing decision logic in agent/cli-backend for trigger=user inbounds when activeSessions > 1?
  3. Would a OPENCLAW_DISABLE_HEARTBEAT env flag be acceptable as a short-term mitigation while the routing fix is being prepared?

Reference

  • Previous bug filed same week: #83491 (WhatsApp runtime regression on 2026.5.12) — fixed in cce0049 (PR #83647), awaiting tagged release
  • Memory rule pinning the version: feedback_openclaw_version_lock.md (stay on 2026.5.3-1)
  • Affected workflow: live Paige WhatsApp DM, gateway ~/.openclaw/logs/gateway.log (2026-05-19 evidence collected end-to-end)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING