openclaw - 💡(How to fix) Fix Session-id poisoning: scalar cliSessionIds reused via claude --resume bypasses cliSessionBindings hash validation (2026.4.22) [1 comments, 2 participants]

Summary

On openclaw 2026.4.22, a single failed first-turn on a gateway boot can poison sessions.json such that every subsequent turn fails permanently, persisting across gateway restarts. Manual cleanup of session-id fields in sessions.json is the only recovery.

The root cause is a scalar-fallback path in cli-session-hmH-lCCb.js that reads cliSessionIds[provider] / claudeCliSessionId directly and passes the value to claude --resume, bypassing the hash-validation that cliSessionBindings provides.

Symptoms

Telegram (or any direct-conversation surface) returns "Something went wrong" on every message. Gateway log (openclaw-gateway.stderr.log) shows two failure flavors that are the same root cause:

Slow flavor: CLI produced no output for 180s — claude CLI hangs trying to handshake with a dead/stale stdio MCP loopback.
Fast flavor: Claude CLI failed reason=unknown in 1-4s — claude CLI exits because the resumed session's .jsonl no longer exists or is in an inconsistent state.

Gateway restarts do not clear the failure. The poisoned ids persist.

Repro (observed, not synthetic)

Gateway boots, mints a session id, writes it to sessionId, claudeCliSessionId, and cliSessionIds["claude-cli"].
First turn fails for any reason (MCP loopback racing, autopilot lock, timeout, etc.) before cliSessionBindings["claude-cli"] gets populated with hashes.
Subsequent turns enter resolveCliSessionReuse → bindings entry is empty → falls back to scalar via getCliSessionBinding.
Scalar entry has no mcpConfigHash / mcpResumeHash / authEpoch / extraSystemPromptHash — every hash comparison returns undefined !== currentHash (truthy invalidation).
But getCliSessionId reads the scalar cliSessionIds[provider] directly and passes it to --resume, bypassing the invalidation result. Claude CLI then resumes a session whose MCP URLs / transcript no longer exist → either hangs (180s) or fast-fails.
Failure path does not call clearCliSession, so the poison persists indefinitely.

File:line refs (2026.4.22 dist)

cli-session-hmH-lCCb.js:14-33 — getCliSessionBinding (scalar fallback)
cli-session-hmH-lCCb.js:34-36 — getCliSessionId (direct scalar read; bypasses reuse logic)
cli-session-hmH-lCCb.js:81-101 — resolveCliSessionReuse
attempt-execution.runtime-ZS3YlO9g.js:292 — transcript-presence check exists, but doesn't catch the scalar-fallback path because the scalar id may still point to an existing-but-stale transcript whose MCP URLs are dead.
errors-DKni0i_3.js:639 — detector patterns include no conversation found, but claude CLI exits with a different error from the MCP-handshake hang.

Workaround (manual)

Edit ~/.openclaw/agents/main/sessions/sessions.json for the affected session key (e.g., agent:main:telegram:direct:<chat_id>):

{
  "sessionId": null,
  "sessionFile": null,
  "systemSent": false,
  "compactionCount": 0,
  "claudeCliSessionId": null,
  "cliSessionIds": {},
  "cliSessionBindings": {}
}

Then restart the gateway. The next message mints a fresh session, the successful turn writes proper cliSessionBindings["claude-cli"] with hashes, and steady-state operation resumes.

Critical: clearing only sessionId is not enough — the scalar fallback path reads claudeCliSessionId and cliSessionIds["claude-cli"] directly. All three id fields must be cleared, and cliSessionBindings must be reset to {} so the next turn writes fresh hashes.

Verified working 2026-04-24: 26-second clean turn after several hours of failures.

Proposed upstream fixes

Either of these would close the gap. The first is conservative; the second is the structural fix.

Option A — Clear poison on repeated failure After N consecutive failed turns on the same session key, automatically call clearCliSession (or equivalent) and force a fresh session-id mint on the next turn. This contains the blast radius of any failure that fails to populate cliSessionBindings.

Option B — Never fall back to scalar without hashes In getCliSessionBinding: if cliSessionBindings[provider] is missing/empty, return null (do not fall back to the scalar). In getCliSessionId: gate the scalar read on a successful binding lookup. If no validated binding exists, mint a fresh --session-id instead of --resume. This removes the entire bypass class.

Option B also addresses the underlying invariant violation: the structured cliSessionBindings is the source of truth for "is this session reusable", and any path that reads the scalar bypassing that check is a bug.

Environment

openclaw 2026.4.22 (current stable at time of report; observed on macOS 14)
Surface: Telegram direct conversation
Backend: claude CLI 2.1.112 with --strict-mcp-config and ephemeral loopback MCP port (mkdtemp + listen(0) per gateway boot)
Triggering condition: any first-turn failure on a fresh boot before cliSessionBindings is populated. We've reproduced via MCP loopback port races and via PGLite autopilot lock contention with a co-resident MCP server.

Different root cause but adjacent symptom — referenced for triage:

#66849 — 2026.4.14 active-memory timeouts producing "Something went wrong" (different code path)
#71178 — openclaw update mid-turn corrupting Telegram state (different code path, similar persistence pattern)

Anthropic claude-code issues that confirm the upstream behavior we're exploiting:

anthropics/claude-code#25032 — sessions-index.json stale, --resume shows missing sessions
anthropics/claude-code#18880 — claude --resume crashes on killed sessions

Happy to provide more log excerpts, a sanitized sessions.json snapshot, or test against a candidate fix.

extent analysis

TL;DR

The most likely fix is to modify the getCliSessionBinding and getCliSessionId functions to prevent falling back to scalar values without hashes, ensuring that the structured cliSessionBindings is the source of truth for session reusability.

Guidance

Review the cli-session-hmH-lCCb.js file, specifically lines 14-33 and 34-36, to understand the current implementation of getCliSessionBinding and getCliSessionId.
Consider implementing Option B from the proposed upstream fixes, which involves returning null from getCliSessionBinding if cliSessionBindings[provider] is missing or empty, and gating the scalar read in getCliSessionId on a successful binding lookup.
Verify that the fix works by testing the gateway with a fresh session and simulating a first-turn failure to ensure that the session is properly cleared and a new session ID is minted.
Be cautious when implementing the fix, as it may require additional error handling and logging to ensure that the gateway behaves correctly in all scenarios.

Example

No code snippet is provided, as the issue requires a thorough understanding of the existing implementation and the proposed fixes.

Notes

The fix may not be applicable to all versions of openclaw, and it is essential to test the changes thoroughly to ensure that they do not introduce new issues. Additionally, the proposed fixes may require modifications to other parts of the codebase, and a comprehensive review of the changes is necessary to ensure that they are correct and effective.

Recommendation

Apply the workaround by modifying the getCliSessionBinding and getCliSessionId functions to prevent falling back to scalar values without hashes, as described in Option B of the proposed upstream fixes. This approach addresses the underlying invariant violation and removes the bypass class, providing a more robust and reliable solution.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Session-id poisoning: scalar cliSessionIds reused via claude --resume bypasses cliSessionBindings hash validation (2026.4.22) [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workaround (manual)

Code Example

Summary

Symptoms

Repro (observed, not synthetic)

File:line refs (2026.4.22 dist)

Workaround (manual)

Proposed upstream fixes

Environment

Related

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Session-id poisoning: scalar cliSessionIds reused via claude --resume bypasses cliSessionBindings hash validation (2026.4.22) [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workaround (manual)

Code Example

Summary

Symptoms

Repro (observed, not synthetic)

File:line refs (2026.4.22 dist)

Workaround (manual)

Proposed upstream fixes

Environment

Related

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING