openclaw - ✅(Solved) Fix Gateway-restart wake-ups for Claude CLI Discord sessions arrive cold — neither --resume nor openClawHistoryPrompt rebuild fires [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#74125Fetched 2026-04-30 06:28:11
View on GitHub
Comments
1
Participants
2
Timeline
10
Reactions
0
Author
Timeline (top)
mentioned ×3subscribed ×3cross-referenced ×2closed ×1

After a gateway restart, queued wake-up / system-event turns delivered to a Claude-CLI-bound Discord session arrive without prior conversational context. Normal user-message turns on the same session post-restart DO retain context, so this is specific to the wake-up / system-event path.

The result is the model behaving as if the conversation just started, even though the session has a long prior history.

Error Message

Gateway only logs provider/model/promptChars for cli exec — no argv, no --resume, no cliSessionId. So this can't be confirmed from current logs alone (gateway.log post-restart cli exec lines are too sparse).

Root Cause

Likely root cause (from source dive)

Fix Action

Fixed

PR fix notes

PR #74171: fix(agents): preserve CLI wake-up session metadata

Description (problem / solution / changelog)

Summary

  • Problem: post-restart wake/cron/heartbeat turns for Discord Claude CLI sessions could rebuild CLI prompt identity from synthetic system-event context instead of persisted Discord chat metadata.
  • Why it matters: the static prompt hash could drift, causing --resume to be skipped even when the prior CLI session binding was still valid.
  • What changed: system-event prompt construction now restores missing prompt-only chat/route metadata from the persisted session entry before building static CLI prompt identity, and CLI exec logs include redacted resume diagnostics.
  • What did NOT change: auth, auth-epoch, MCP, and real system-prompt mismatch invalidation remain strict. This does not address the separate group-intro prompt drift tracked in #69118.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #74125
  • Related #69118
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: wake/heartbeat/cron turns can arrive with Provider=heartbeat|cron-event|exec-event and no live ChatType, while CLI reuse hashes were built from that live context instead of the persisted Discord session entry.
  • Missing detection / guardrail: existing tests covered no-reset behavior and static prompt forwarding, but not a post-restart system-event turn rebuilding the same static CLI prompt identity as the prior Discord channel session.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file:
    • src/auto-reply/reply/get-reply-run.exec-hint.test.ts
    • src/auto-reply/reply/get-reply-run.media-only.test.ts
    • src/agents/cli-runner.spawn.test.ts
  • Scenario the test should lock in: a cron/system-event turn with no live ChatType uses persisted Discord channel metadata for static CLI prompt identity and logs resume diagnostics without exposing raw CLI session IDs.
  • Why this is the smallest reliable guardrail: the bug is at prompt identity construction before CLI reuse, so the prepared-reply seam proves the hash input is stable without needing a live Discord or Claude CLI run.
  • Existing test that already covers this (if any): none.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

Wake/heartbeat/cron system-event turns targeting a persisted Discord Claude CLI session now keep CLI continuity when auth and MCP identity still match. CLI exec logs also include redacted resume/reuse diagnostics.

Diagram (if applicable)

Before:
post-restart wake -> synthetic cron/heartbeat context -> static prompt hash drifts -> CLI resume skipped -> cold turn

After:
post-restart wake -> persisted Discord prompt metadata restored -> static prompt hash remains stable -> CLI resume can continue

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: N/A. Resume diagnostics intentionally log only presence/state and a short one-way session fingerprint, not raw CLI session IDs or argv.

Repro + Verification

Environment

  • OS: Linux container
  • Runtime/container: Node v24.11.0 via /tmp/node-v24.11.0-linux-x64/bin
  • Model/provider: Claude CLI path exercised through unit/seam tests
  • Integration/channel (if any): Discord session metadata simulated at the prepared-reply seam
  • Relevant config (redacted): persisted Discord channel session entry with stored CLI binding metadata

Steps

  1. Start from a persisted Discord channel session entry with CLI session metadata.
  2. Simulate a post-restart cron/system-event turn with Provider=cron-event and no live ChatType.
  3. Build the prepared reply and inspect the static CLI prompt identity passed toward the runner.

Expected

  • The static prompt identity is built from persisted Discord channel metadata.
  • CLI diagnostics show trigger/reuse state without leaking raw CLI session IDs.

Actual

  • Before this fix, the synthetic system-event context could omit group/channel prompt context and drift the static prompt hash.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios:
    • PATH=/tmp/node-v24.11.0-linux-x64/bin:$PATH pnpm test src/agents/cli-session.test.ts src/agents/cli-runner/prepare.test.ts src/agents/cli-runner.spawn.test.ts src/auto-reply/reply/agent-runner-execution.test.ts src/infra/heartbeat-runner.ghost-reminder.test.ts src/gateway/server-cron.test.ts
    • PATH=/tmp/node-v24.11.0-linux-x64/bin:$PATH pnpm test src/auto-reply/reply/get-reply-run.exec-hint.test.ts src/auto-reply/reply/get-reply-run.media-only.test.ts src/agents/cli-runner.spawn.test.ts
    • PATH=/tmp/node-v24.11.0-linux-x64/bin:$PATH pnpm exec oxfmt --check --threads=1 src/auto-reply/reply/session.ts src/auto-reply/reply/get-reply-run.ts src/auto-reply/reply/agent-runner-execution.ts src/agents/cli-runner/prepare.ts src/agents/cli-runner/execute.ts src/agents/cli-session.ts src/infra/heartbeat-runner.ts src/gateway/server-cron.ts CHANGELOG.md src/auto-reply/reply/get-reply-run.exec-hint.test.ts src/auto-reply/reply/get-reply-run.media-only.test.ts src/agents/cli-runner.spawn.test.ts
    • PATH=/tmp/node-v24.11.0-linux-x64/bin:$PATH pnpm check:changed
  • Edge cases checked:
    • missing system-event ChatType
    • synthetic heartbeat/cron-event provider labels
    • persisted Discord channel route metadata
    • normal user turns remain on live metadata
    • explicit system-event metadata is not overwritten
    • raw CLI session IDs are not logged
  • What you did not verify:
    • a live Discord + Claude CLI gateway restart repro

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: restoring persisted prompt metadata too broadly could mask a real auth/MCP/system-prompt mismatch.
  • Mitigation: this only changes prompt context reconstruction; CLI reuse invalidation for auth profile, auth epoch, MCP, and durable prompt hash mismatch remains strict.

Changed files

  • src/agents/cli-runner.spawn.test.ts (modified, +26/-1)
  • src/agents/cli-runner/execute.ts (modified, +51/-1)
  • src/auto-reply/reply/get-reply-run.exec-hint.test.ts (modified, +95/-0)
  • src/auto-reply/reply/get-reply-run.media-only.test.ts (modified, +58/-0)
  • src/auto-reply/reply/get-reply-run.ts (modified, +129/-16)
RAW_BUFFERClick to expand / collapse

Summary

After a gateway restart, queued wake-up / system-event turns delivered to a Claude-CLI-bound Discord session arrive without prior conversational context. Normal user-message turns on the same session post-restart DO retain context, so this is specific to the wake-up / system-event path.

The result is the model behaving as if the conversation just started, even though the session has a long prior history.

Repro

  1. Bind a Discord channel session to the Claude CLI backend (e.g. agents.defaults.agentRuntime.id = "claude-cli", model anthropic/claude-opus-4-7 or similar).
  2. Have a real conversation in that channel (so a prior cliSessionId / history exists).
  3. Queue a wake-up / scheduled action against that same session.
  4. Restart the gateway (openclaw gateway restart) before the wake-up fires (or such that the wake-up is the first turn after restart).
  5. Observe: the wake-up turn lands cold — no recollection of prior conversation.
  6. Send a normal user message in the same channel afterward — context returns.

Likely root cause (from source dive)

  • --resume {sessionId} is configured for the Claude CLI backend in extensions/anthropic/cli-backend.js.
  • The runtime decision point is useResume = Boolean(cliSessionIdToUse && resolvedSessionId && backend.resumeArgs && ...) in execute.runtime-CSxdSSj_.js:243. So --resume only fires when a prior cliSessionId is threaded in.
  • When resume isn't possible, OpenClaw is supposed to rebuild prior context into the prompt via openClawHistoryPrompt (built in prepare.runtime-O4dLDhbV.js:780) — but only when reusableCliSession.sessionId is empty. Otherwise it relies entirely on the resumed CLI thread carrying its own history.
  • After a gateway restart, resolveCliSessionReuse can invalidate the prior binding (auth-epoch / fingerprint mismatch). If the wake-up code path either (a) doesn't thread cliSessionBinding / cliSessionId correctly, or (b) ends up with a sessionId set but a CLI thread that was never resumed/replayed, both the --resume path AND the history-rebuild path get skipped — which exactly matches the observed behavior.

Logs

Gateway only logs provider/model/promptChars for cli exec — no argv, no --resume, no cliSessionId. So this can't be confirmed from current logs alone (gateway.log post-restart cli exec lines are too sparse).

Observability ask

Log cli exec argv (or at minimum useResume, the resumed cliSessionId, and the trigger source — user-message vs wake-up/system-event) at info level. The current cli exec log line is too sparse to debug paths like this without a source dive.

Relevant files:

  • cli-runner-B1dhhBPS.js
  • execute.runtime-CSxdSSj_.js:243
  • prepare.runtime-O4dLDhbV.js:780
  • extensions/anthropic/cli-backend.js

Environment

  • macOS 26.3.1 (arm64), Node v25.8.1
  • OpenClaw via ~/.npm-global/lib/node_modules/openclaw
  • Default agent runtime: claude-cli, primary model openai-codex/gpt-5.5 with anthropic/claude-* in fallback chain (issue observed when active path was Claude CLI).

extent analysis

TL;DR

The wake-up/system-event turns after a gateway restart may not retain conversational context due to issues with --resume and history rebuild logic.

Guidance

  • Verify that cliSessionId is correctly threaded in the wake-up code path to ensure --resume is triggered.
  • Check the resolveCliSessionReuse function to see if the prior binding is being invalidated after a gateway restart, causing the history rebuild path to be skipped.
  • Add logging for cli exec argv, useResume, resumed cliSessionId, and trigger source to improve observability and debug the issue.
  • Review the execute.runtime-CSxdSSj_.js and prepare.runtime-O4dLDhbV.js files to ensure the --resume and history rebuild logic is correctly implemented.

Example

No code snippet is provided as the issue requires a deeper understanding of the OpenClaw codebase and the specific implementation of the --resume and history rebuild logic.

Notes

The issue seems to be specific to the wake-up/system-event path and may not affect normal user-message turns. The logging ask for cli exec argv and other relevant information will help improve observability and debug the issue.

Recommendation

Apply a workaround by modifying the wake-up code path to correctly thread cliSessionId and ensure --resume is triggered, or add a temporary fix to rebuild prior context into the prompt via openClawHistoryPrompt when reusableCliSession.sessionId is empty.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Gateway-restart wake-ups for Claude CLI Discord sessions arrive cold — neither --resume nor openClawHistoryPrompt rebuild fires [1 pull requests, 1 comments, 2 participants]