openclaw - 💡(How to fix) Fix bug: Claude CLI tmux empty-output retry 后 fresh/resume 混用 stale session id 导致 fallback

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Error: Session ID 310787e7-e157-4306-8ae6-b1c7a211a80e is already in use.

Code Example

2026-05-17T16:09:44.525+08:00 [agent] tmux.run.complete sessionName=openclaw-claude-b84f4d1f68f9 aliveAtEnd=true sawStop=true transcriptEmitted=false bufferedPaneFlushed=false suppressPaneFallback=true paneLooksLikeEnvelope=true recoveryAttempts=0 launchMode=resume durationMs=405257
2026-05-17T16:09:44.526+08:00 [agent] tmux.empty-output.retry runId=fe62ccc0-961e-4075-b126-16e3ced5b3da sessionName=openclaw-claude-b84f4d1f68f9 launchMode=resume reason=envelope-without-transcript transcriptPath=/tmp/openclaw/claude-tmux/openclaw-claude-b84f4d1f68f9/claude-config/projects/-Users-lukin-AgentData-chief/310787e7-e157-4306-8ae6-b1c7a211a80e.jsonl paneTailSize=37559 retryAttempt=0

---

2026-05-17T16:09:44.553+08:00 [agent] tmux.run.start runId=fe62ccc0-961e-4075-b126-16e3ced5b3da sessionName=openclaw-claude-b84f4d1f68f9 backendId=claude-cli model=sonnet promptChars=13215 cliSessionIdProvided=true ...
2026-05-17T16:09:44.560+08:00 [agent] tmux.ensureSession.start sessionName=openclaw-claude-b84f4d1f68f9 exists=false hasMetadata=true mismatchReasons=["launchHash","launchMode"]
2026-05-17T16:09:44.577+08:00 [agent] tmux.ensureSession.created sessionName=openclaw-claude-b84f4d1f68f9
2026-05-17T16:09:44.578+08:00 [agent] tmux.run.waitForStartup sessionName=openclaw-claude-b84f4d1f68f9

---

Error: Session ID 310787e7-e157-4306-8ae6-b1c7a211a80e is already in use.

---

2026-05-17T16:10:14.616+08:00 [model-fallback] model fallback decision: decision=candidate_failed requested=claude-cli/sonnet candidate=claude-cli/sonnet reason=timeout status=408 errorPreview="CLI tmux session did not become ready within 30s." next=openai-crs/gpt-5.5

---

2026-05-17T16:11:38.041+08:00 [model-fallback] model fallback decision: decision=candidate_succeeded requested=claude-cli/sonnet candidate=openai-crs/gpt-5.5 previousAttempts=[{ provider=claude-cli, model=sonnet, reason=timeout, status=408, errorPreview="CLI tmux session did not become ready within 30s." }]
RAW_BUFFERClick to expand / collapse

背景

2026-05-17 16:09~16:11(Asia/Shanghai),Chief/Ada 在 Feishu 群 oc_adbe5a1b0bc388f44047caa8d940829f 的主会话原本使用 claude-cli/sonnet,随后自动 fallback 到 openai-crs/gpt-5.5。用户追问后排查 OpenClaw 日志、tmux pane log 和 Claude transcript,确认不是 Claude 模型本身失败,而是 OpenClaw 的 Claude CLI tmux session 恢复逻辑在 fresh / resume 状态之间混用了 stale session id。

用户可见现象

  • 状态卡前一轮显示 Model: claude-cli/sonnet
  • 后续状态卡变成 Model: openai-crs/gpt-5.5
  • 用户感知为「cli-sonnet 最后死掉了,然后切到 GPT-5」。
  • OpenClaw 没有直接暴露明确的 session-id conflict,只表现为 fallback。

关键日志时间线

1. 16:09:44:Claude CLI 运行结束但没有 transcript

Gateway 日志:

2026-05-17T16:09:44.525+08:00 [agent] tmux.run.complete sessionName=openclaw-claude-b84f4d1f68f9 aliveAtEnd=true sawStop=true transcriptEmitted=false bufferedPaneFlushed=false suppressPaneFallback=true paneLooksLikeEnvelope=true recoveryAttempts=0 launchMode=resume durationMs=405257
2026-05-17T16:09:44.526+08:00 [agent] tmux.empty-output.retry runId=fe62ccc0-961e-4075-b126-16e3ced5b3da sessionName=openclaw-claude-b84f4d1f68f9 launchMode=resume reason=envelope-without-transcript transcriptPath=/tmp/openclaw/claude-tmux/openclaw-claude-b84f4d1f68f9/claude-config/projects/-Users-lukin-AgentData-chief/310787e7-e157-4306-8ae6-b1c7a211a80e.jsonl paneTailSize=37559 retryAttempt=0

含义:Claude CLI 看起来发出了 Stop / envelope,但 OpenClaw 没读到可用 assistant transcript,于是进入 empty-output retry。

2. retry 后 fresh 启动仍携带旧 session id

日志显示 retry 重新启动 tmux:

2026-05-17T16:09:44.553+08:00 [agent] tmux.run.start runId=fe62ccc0-961e-4075-b126-16e3ced5b3da sessionName=openclaw-claude-b84f4d1f68f9 backendId=claude-cli model=sonnet promptChars=13215 cliSessionIdProvided=true ...
2026-05-17T16:09:44.560+08:00 [agent] tmux.ensureSession.start sessionName=openclaw-claude-b84f4d1f68f9 exists=false hasMetadata=true mismatchReasons=["launchHash","launchMode"]
2026-05-17T16:09:44.577+08:00 [agent] tmux.ensureSession.created sessionName=openclaw-claude-b84f4d1f68f9
2026-05-17T16:09:44.578+08:00 [agent] tmux.run.waitForStartup sessionName=openclaw-claude-b84f4d1f68f9

注意这里 cliSessionIdProvided=true。这意味着即使 metadata 里的 claudeSessionId 被清理,input.cliSessionId 仍然可进入 fresh 启动路径。

3. tmux pane 直接显示 Claude CLI 报错

/tmp/openclaw/claude-tmux/openclaw-claude-b84f4d1f68f9/pane.log 内容只有 94 bytes,核心错误:

Error: Session ID 310787e7-e157-4306-8ae6-b1c7a211a80e is already in use.

也就是说 Claude CLI 并不是模型推理失败,而是收到一个已被占用的 session id,拒绝启动。

4. 16:10:14:OpenClaw 将其归类为 startup timeout 并 fallback

2026-05-17T16:10:14.616+08:00 [model-fallback] model fallback decision: decision=candidate_failed requested=claude-cli/sonnet candidate=claude-cli/sonnet reason=timeout status=408 errorPreview="CLI tmux session did not become ready within 30s." next=openai-crs/gpt-5.5

5. 16:11:38:fallback 到 GPT-5.5 成功

2026-05-17T16:11:38.041+08:00 [model-fallback] model fallback decision: decision=candidate_succeeded requested=claude-cli/sonnet candidate=openai-crs/gpt-5.5 previousAttempts=[{ provider=claude-cli, model=sonnet, reason=timeout, status=408, errorPreview="CLI tmux session did not become ready within 30s." }]

根因分析

当前代码路径大致在 dist/attempt-execution-*.js / src/agents/cli-runner/tmux/*

  • buildClaudeTmuxArgs()resume 模式使用 --resume <claudeSessionId>
  • fresh 模式会在 backend.sessionArg && params.launch.sessionId 时追加 --session-id <input.cliSessionId>
  • empty-output retry 逻辑会 kill tmux,并尝试从 metadata 里移除 claudeSessionId
  • 但 retry 之后的 fresh 路径仍可能通过 input.cliSessionId 继续把旧 Claude session id 传给 CLI。

这次实际出现了两个 session id:

  • stale / conflicted: 310787e7-e157-4306-8ae6-b1c7a211a80e
  • later bound / actual fresh: 15fe948e-da9b-475a-b939-0c5dab7d0b87

OpenClaw 在 transcript tailing 和 retry 参数上没有完全切换到新 session id:

  • transcriptPath 仍指向旧 310787...jsonl,导致 transcriptEmitted=false
  • retry fresh 启动时仍携带旧 310787... 作为 --session-id,触发 Claude CLI Session ID ... is already in use
  • 该错误没有被识别为 session-id conflict,而是被 waitForStartup 归类成 30s timeout,最终走通用 model fallback。

期望行为

遇到 empty-output retry / stale transcript / session id conflict 时,OpenClaw 应该明确恢复 Claude CLI 会话,而不是误触发模型 fallback。

建议修复

  1. empty-output retry 后彻底丢弃 stale Claude session id

    • 目前只删除 metadata 里的 claudeSessionId 不够。
    • 如果下一次启动走 fresh,必须确保不会继续把 input.cliSessionId 作为 Claude CLI --session-id 传入。
    • 可以引入 per-attempt effectiveCliSessionId,empty-output retry 后将其置空或重新生成。
  2. fresh 模式不要复用已知冲突的 Claude session id

    • launchMode=fresh 时,如果要传 --session-id,必须保证该 id 未被 Claude CLI 占用。
    • 对 Claude CLI 来说,fresh + old --session-id 不是安全恢复方式;已占用时应改用 --resume oldId 或完全新 session id。
  3. 识别 Session ID ... is already in use 为可恢复错误

    • waitForStartup 期间读取 pane tail,如果匹配:
      • Session ID .* is already in use
    • 不要继续等满 30s。
    • 立即归类为 session-id-conflict,kill tmux,丢弃该 id,fresh 新开,或按明确 resume 逻辑恢复。
  4. transcript tailer 应跟随实际 startup/hook session id

    • 如果 waitForStartup.done 或 hook event 返回新 claudeSessionId,transcript tailer 应切到新 id。
    • 不应继续盯 stale input.cliSessionId 的 transcript 文件。
  5. 日志诊断更明确

    • 现在用户只看到 fallback 到 GPT。
    • 建议 model fallback reason 或 diagnostic 里保留 session-id-conflict,避免被误判成 Claude 模型失败。

验收标准

  • 构造 stale input.cliSessionId + existing Claude process 占用该 id 的场景,OpenClaw 不再等 30s timeout。
  • pane 输出 Session ID ... is already in use 时,OpenClaw 识别为 session-id-conflict
  • recovery 后要么使用 --resume <oldId>,要么 fresh 新开且不传旧 --session-id
  • transcript tailer 使用实际 Claude session id,不再读 stale transcript 导致 transcriptEmitted=false
  • 不触发不必要的 model fallback 到其他 provider。
  • 补单元测试覆盖 empty-output.retry 后 stale cliSessionId 不再进入 fresh args。

影响

  • 降低 Claude CLI tmux 恢复时的误 fallback。
  • 避免用户误以为 cli-sonnet 模型挂了。
  • 减少一次失败恢复的 30s 卡顿。
  • 提升长会话 / compact 后 Claude CLI resume 的稳定性。

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix bug: Claude CLI tmux empty-output retry 后 fresh/resume 混用 stale session id 导致 fallback