openclaw - ✅(Solved) Fix Heartbeat isolatedSession=true replays prior heartbeat context, causing deterministic overflow/restart loop [1 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#84218Fetched 2026-05-20 03:42:30
View on GitHub
Comments
2
Participants
3
Timeline
10
Reactions
1
Timeline (top)
labeled ×5cross-referenced ×3commented ×2

Heartbeat runs configured with isolatedSession=true and lightContext=true can still receive a large replay of prior heartbeat context. The docs describe isolatedSession: true as a "fresh session each run (no conversation history)", but the compiled prompt can include context-engine summaries and prior assistant/tool heartbeat outputs associated with the stable heartbeat session key.

On our production VPS this became a deterministic loop:

  • heartbeat session key: agent:trent:main:heartbeat
  • configured model before mitigation: ollama/nemotron-3-nano:30b
  • estimated prompt: ~124,349 tokens
  • prompt budget before reserve: ~111,616 tokens
  • overflow: ~12,733 tokens
  • messages reported by overflow diagnostic: 70
  • auto-compaction attempts succeeded/retried, then the same precheck failed again
  • after attempt 3/3, OpenClaw restarted the heartbeat session id and repeated on the next tick

In one 6-hour window on this deployment alone, we observed approximately 280 overflow prechecks and 70 restart cycles. The first visible crossing of the nemotron context limit was no later than 2026-05-17T22:15:27Z, and the pattern continued afterward until we mitigated locally by moving the heartbeat lane to a larger-context model and reducing cadence.

Root Cause

Source-read root cause

Fix Action

Fix / Workaround

  • heartbeat session key: agent:trent:main:heartbeat
  • configured model before mitigation: ollama/nemotron-3-nano:30b
  • estimated prompt: ~124,349 tokens
  • prompt budget before reserve: ~111,616 tokens
  • overflow: ~12,733 tokens
  • messages reported by overflow diagnostic: 70
  • auto-compaction attempts succeeded/retried, then the same precheck failed again
  • after attempt 3/3, OpenClaw restarted the heartbeat session id and repeated on the next tick

PR fix notes

PR #84248: [codex] isolate heartbeat context-engine session keys

Description (problem / solution / changelog)

Summary

  • Problem: isolated heartbeat runs kept a stable :heartbeat routing key everywhere, so context-engine lifecycle hooks could still associate later ticks with prior heartbeat state.
  • Solution: preserve the stable routing SessionKey, but forward a fresh per-tick contextEngineSessionKey only through context-engine bootstrap, assemble, ingest, and compaction paths.
  • What changed: heartbeat isolated runs now mint :heartbeat-run:<sessionId> keys for context-engine state, and the embedded runner/auto-reply flow threads that override through lifecycle and recovery paths with regression coverage.
  • What did NOT change (scope boundary): normal session routing, wake re-entry convergence on the stable :heartbeat key, and non-heartbeat runs still default to the existing sessionKey behavior.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #84218
  • Related #84218
  • This PR fixes a bug or regression

Motivation

  • Isolated heartbeat is already documented as starting from fresh context, so replaying accumulated heartbeat context into later ticks can grow prompts until the heartbeat loop compacts and restarts instead of doing scheduled work.

Real behavior proof (required for external PRs)

Behavior addressed: isolatedSession=true heartbeat runs should keep the stable delivery SessionKey while using a fresh context-engine-only session identity on each isolated tick. Real environment tested: Local OpenClaw source checkout in a Codex worktree, using a temporary workspace and session store, with the production runHeartbeatOnce() path invoked twice under isolatedSession=true and heartbeat.target=none to avoid outbound delivery. Exact steps or command run after this patch:

node --import tsx --input-type=module <<'EOF'
import fs from 'node:fs/promises';
import os from 'node:os';
import path from 'node:path';
import { runHeartbeatOnce } from './src/infra/heartbeat-runner.ts';
import { resolveMainSessionKey } from './src/config/sessions.ts';

const tmpDir = await fs.mkdtemp(path.join(os.tmpdir(), 'openclaw-hb-proof-'));
const storePath = path.join(tmpDir, 'sessions.json');
await fs.writeFile(path.join(tmpDir, 'HEARTBEAT.md'), '- Check status\n', 'utf8');
const cfg = {
  agents: { defaults: { workspace: tmpDir, heartbeat: { every: '5m', target: 'none', isolatedSession: true } } },
  session: { store: storePath },
};
const sessionKey = resolveMainSessionKey(cfg);
await fs.writeFile(storePath, JSON.stringify({
  [sessionKey]: { sessionId: 'seed-session', updatedAt: Date.now(), lastChannel: 'none', lastProvider: 'heartbeat', lastTo: 'self' }
}), 'utf8');
const observed = [];
const getReplyFromConfig = async (ctx, opts) => {
  observed.push({ SessionKey: ctx.SessionKey, contextEngineSessionKey: opts.contextEngineSessionKey });
  return { text: 'HEARTBEAT_OK' };
};
for (const nowMs of [1, 2]) {
  await runHeartbeatOnce({ cfg, sessionKey, deps: { nowMs: () => nowMs, getQueueSize: () => 0, getReplyFromConfig } });
}
console.log(JSON.stringify(observed, null, 2));
await fs.rm(tmpDir, { recursive: true, force: true });
EOF

Evidence after fix (console output):

[sessions/store] pruned stale session entries
[sessions/store] pruned stale session entries
[
  {
    "SessionKey": "agent:main:main:heartbeat",
    "contextEngineSessionKey": "agent:main:main:heartbeat-run:a5bcc111-0825-4313-bc6e-f2c0ab3809bd"
  },
  {
    "SessionKey": "agent:main:main:heartbeat",
    "contextEngineSessionKey": "agent:main:main:heartbeat-run:ad2bf62e-2a5e-4437-836b-19e54b72f069"
  }
]

Observed result after fix: Both isolated runs used the stable routing key agent:main:main:heartbeat, while the context-engine-only key changed between runs, proving the heartbeat tick kept stable delivery routing without reusing the prior tick's context-engine identity. What was not tested: Live gateway/channel delivery, end-to-end overflow looping against a real provider, and non-PI harnesses.

Root Cause (if applicable)

  • Root cause: runHeartbeatOnce() created a fresh isolated session id, but still passed the stable <base>:heartbeat key into the auto-reply and embedded-runner context-engine lifecycle, so any context engine keyed by sessionKey could replay earlier heartbeat state.
  • Missing detection / guardrail: isolated-heartbeat tests only covered stable :heartbeat key convergence and did not assert a separate fresh identity for context-engine lifecycle/compaction paths.
  • Contributing context (if known): the same stable key was reused across bootstrap, assemble, afterTurn, ingest, and compaction maintenance paths.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file:
    • src/infra/heartbeat-runner.isolated-key-stability.test.ts
    • src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts
    • src/agents/pi-embedded-runner/tool-result-context-guard.test.ts
    • src/agents/pi-embedded-runner/run.overflow-compaction.test.ts
    • src/auto-reply/reply/agent-runner-execution.test.ts
  • Scenario the test should lock in: isolated heartbeat ticks keep the stable routing key for delivery, but all context-engine-owned lifecycle and compaction calls use a fresh per-run identity.
  • Why this is the smallest reliable guardrail: the bug is in the seam between heartbeat session setup and embedded context-engine lifecycle forwarding, so seam tests can prove the identity split without needing a full live heartbeat deployment.
  • Existing test that already covers this (if any): the pre-existing isolated heartbeat stability test covered stable :heartbeat routing convergence only.
  • If no new test is added, why not: N/A.

User-visible / Behavior Changes

  • Heartbeat runs with isolatedSession=true now honor the documented fresh-session contract for context-engine state while keeping the existing stable heartbeat routing key.

Diagram (if applicable)

N/A.

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: local source checkout via node scripts/run-vitest.mjs
  • Model/provider: N/A
  • Integration/channel (if any): none
  • Relevant config (redacted): heartbeat isolated-session config covered by targeted tests

Steps

  1. Run isolated heartbeat regression coverage.
  2. Run embedded context-engine forwarding and overflow-compaction coverage.
  3. Confirm the new context-engine-only key stays fresh per isolated tick while the outward routing key remains stable.

Expected

  • Isolated heartbeat delivery still routes through <base>:heartbeat, but context-engine lifecycle/compaction paths do not reuse prior heartbeat state.

Actual

  • The targeted tests pass with distinct contextEngineSessionKey values across isolated ticks and stable outbound routing keys.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: inspected the heartbeat runner, embedded runner lifecycle forwarding, compaction paths, loop-hook forwarding, and targeted regression tests for the new session-key split.
  • Edge cases checked: isolated heartbeat wake re-entry still converges on the stable :heartbeat routing key; non-heartbeat paths still default to the existing sessionKey contract.
  • What you did not verify: a live gateway/channel heartbeat repro on current main.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Changed files

  • extensions/codex/src/app-server/compact.test.ts (modified, +58/-2)
  • extensions/codex/src/app-server/compact.ts (modified, +10/-3)
  • extensions/codex/src/app-server/run-attempt.context-engine.test.ts (modified, +59/-1)
  • extensions/codex/src/app-server/run-attempt.ts (modified, +6/-2)
  • src/agents/harness/context-engine-lifecycle.ts (modified, +10/-7)
  • src/agents/pi-embedded-runner/compact.queued.ts (modified, +3/-3)
  • src/agents/pi-embedded-runner/compact.types.ts (modified, +2/-0)
  • src/agents/pi-embedded-runner/run.overflow-compaction.test.ts (modified, +36/-0)
  • src/agents/pi-embedded-runner/run.ts (modified, +7/-5)
  • src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts (modified, +68/-0)
  • src/agents/pi-embedded-runner/run/attempt.ts (modified, +11/-0)
  • src/agents/pi-embedded-runner/run/params.ts (modified, +2/-0)
  • src/agents/pi-embedded-runner/run/types.ts (modified, +2/-0)
  • src/agents/pi-embedded-runner/tool-result-context-guard.test.ts (modified, +35/-0)
  • src/agents/pi-embedded-runner/tool-result-context-guard.ts (modified, +6/-4)
  • src/auto-reply/get-reply-options.types.ts (modified, +6/-0)
  • src/auto-reply/reply/agent-runner-execution.test.ts (modified, +21/-0)
  • src/auto-reply/reply/agent-runner-execution.ts (modified, +1/-0)
  • src/infra/heartbeat-runner.isolated-key-stability.test.ts (modified, +64/-0)
  • src/infra/heartbeat-runner.ts (modified, +13/-0)

Code Example

{
  "agents": {
    "list": [
      {
        "id": "trent",
        "heartbeat": {
          "every": "5m",
          "model": "ollama/nemotron-3-nano:30b",
          "isolatedSession": true,
          "lightContext": true,
          "target": "none"
        }
      }
    ]
  }
}
RAW_BUFFERClick to expand / collapse

Summary

Heartbeat runs configured with isolatedSession=true and lightContext=true can still receive a large replay of prior heartbeat context. The docs describe isolatedSession: true as a "fresh session each run (no conversation history)", but the compiled prompt can include context-engine summaries and prior assistant/tool heartbeat outputs associated with the stable heartbeat session key.

On our production VPS this became a deterministic loop:

  • heartbeat session key: agent:trent:main:heartbeat
  • configured model before mitigation: ollama/nemotron-3-nano:30b
  • estimated prompt: ~124,349 tokens
  • prompt budget before reserve: ~111,616 tokens
  • overflow: ~12,733 tokens
  • messages reported by overflow diagnostic: 70
  • auto-compaction attempts succeeded/retried, then the same precheck failed again
  • after attempt 3/3, OpenClaw restarted the heartbeat session id and repeated on the next tick

In one 6-hour window on this deployment alone, we observed approximately 280 overflow prechecks and 70 restart cycles. The first visible crossing of the nemotron context limit was no later than 2026-05-17T22:15:27Z, and the pattern continued afterward until we mitigated locally by moving the heartbeat lane to a larger-context model and reducing cadence.

Repro config shape

A single agent heartbeat is enough when the model context window is smaller than the accumulated replay:

{
  "agents": {
    "list": [
      {
        "id": "trent",
        "heartbeat": {
          "every": "5m",
          "model": "ollama/nemotron-3-nano:30b",
          "isolatedSession": true,
          "lightContext": true,
          "target": "none"
        }
      }
    ]
  }
}

The same class should reproduce with any model around a ~112K usable prompt window or smaller once enough heartbeat output has accumulated.

Documentation vs observed behavior

Docs say:

  • isolatedSession: true = "fresh session each run (no conversation history)"
  • lightContext: true = "only inject HEARTBEAT.md from bootstrap files"

Observed behavior:

  • isolatedSession=true creates a new session id, but not a fresh model context.
  • lightContext=true trims bootstrap files only; it does not stop context-engine/session replay of prior heartbeat summaries, assistant outputs, or tool results.
  • Prior heartbeat no-change outputs can be promoted into future heartbeat context, increasing each future prompt.

Source-read root cause

From reading the installed 2026.5.18 dist source, the substrate appears to:

  • derive a stable isolated heartbeat session key like <base>:heartbeat
  • call resolveCronSession(... forceNew: true ...) to create a new session id
  • then pass SessionKey: runSessionKey, where runSessionKey is the stable isolated heartbeat session key
  • pass bootstrapContextMode="lightweight" for lightContext=true

So the session id is fresh, but context is still rebuilt against a stable heartbeat session key that can hydrate old heartbeat activity.

Evidence from compiled context

A failing heartbeat trajectory context.compiled event showed:

  • system prompt original chars: 55,152
  • messages retained in trajectory: 65
  • original message array length: 70
  • visible retained messages included:
    • 6 user/context summaries
    • 54 assistant heartbeat/no-change outputs
    • 4 heartbeat tool results
    • a truncation marker
  • the replayed messages were prior heartbeat summaries and noisy no-change heartbeat replies, not current tick data

The actual heartbeat transcript .jsonl was absent after precheck failure; the evidence was in the trajectory file.

Expected behavior

When isolatedSession=true, a heartbeat tick should be truly bounded/fresh by default:

  • no prior heartbeat assistant replies
  • no prior heartbeat tool results
  • no context-engine replay of previous heartbeat ticks
  • include only current HEARTBEAT.md, current time, pending system events/commitments, and explicitly configured bounded context

If preserving some heartbeat history is desired, it should be opt-in and bounded.

Actual behavior

Prior heartbeat activity is replayed into the next heartbeat prompt despite a fresh session id. Once the replay exceeds the model context window, reactive compaction/restart does not solve it because the same oversized context is regenerated on retry.

Impact

A quiet maintenance feature can become a load loop:

  • repeated context-overflow prechecks
  • repeated auto-compaction attempts
  • repeated heartbeat session restarts
  • degraded /readyz from event-loop delay/utilization
  • no useful heartbeat maintenance work completed

Suggested fix

Please add one of:

  1. A real ephemeral heartbeat mode that prevents any prior heartbeat output/context-engine replay from entering the next tick.
  2. Make isolatedSession=true enforce no prior heartbeat history by default.
  3. Add an explicit bound knob such as heartbeat.maxHistoryMessages, heartbeat.maxContextMessages, or heartbeat.replayHistory=false.

Also consider preventing notify=false / no-change heartbeat outputs from being promoted into future heartbeat context.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When isolatedSession=true, a heartbeat tick should be truly bounded/fresh by default:

  • no prior heartbeat assistant replies
  • no prior heartbeat tool results
  • no context-engine replay of previous heartbeat ticks
  • include only current HEARTBEAT.md, current time, pending system events/commitments, and explicitly configured bounded context

If preserving some heartbeat history is desired, it should be opt-in and bounded.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Heartbeat isolatedSession=true replays prior heartbeat context, causing deterministic overflow/restart loop [1 pull requests, 2 comments, 3 participants]