openclaw - 💡(How to fix) Fix Gateway draining deadlocked by stalled model_call; subagent delivery routed to wrong channel [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#80330Fetched 2026-05-11 03:16:05
View on GitHub
Comments
1
Participants
2
Timeline
13
Reactions
2
Timeline (top)
subscribed ×6mentioned ×5closed ×1commented ×1

Code Example

stalled session: sessionId=bbe85782-7c56-4ea6-bfdb-9ab2e2c5b3ab
  state=processing age=175s queueDepth=1
  reason=active_work_without_progress classification=stalled_agent_run
  activeWorkKind=model_call lastProgress=model_call:started lastProgressAge=168s
  recovery=none

still draining 6 active task(s) and 3 active embedded run(s) before restart
...(每 30s 重复,持续 10 分钟)

shutdown started: gateway restarting
[2005b227a854-im-bot] channel stop exceeded 5000ms after abort; continuing shutdown
shutdown completed cleanly in 6310ms

---

{
  "channel": "feishu",
  "to": "[email protected]",
  "accountId": "2005b227a854-im-bot",
  "agentId": "main"
}

---

{
  "channel": "openclaw-weixin",
  "to": "[email protected]",
  "accountId": "2005b227a854-im-bot"
}
RAW_BUFFERClick to expand / collapse

环境

  • openclaw version: 2026.5.7
  • runtime: node 24.4.1
  • OS: macOS Darwin 25.4.0
  • gateway mode: local, port 18789
  • model: fireworks/accounts/fireworks/routers/kimi-k2p6-turbo

Bug 1: Gateway draining 被 stalled model_call 无限阻塞,导致重启死循环

复现步骤

  1. 主会话(微信渠道)执行一个需要 model_call 的长任务
  2. model_call 卡住,diagnostic 报告:stalled session, activeWorkKind=model_call, lastProgressAge=168s, classification=stalled_agent_run
  3. 用户通过 gateway tool 请求重启(restart requested
  4. Gateway 进入 draining 状态:still draining 6 active task(s) and 3 active embedded run(s) before restart
  5. 卡住的 model_call 永远不会结束,draining 无限循环(持续 10 分钟+)
  6. 所有新任务被拒:GatewayDrainingError: Gateway is draining for restart; new tasks are not accepted
  7. 最终 shutdown 超时:微信渠道 stop 超过 5000ms

预期行为

  • Draining 应该有硬性超时(如 60s),强制结束 stalled tasks
  • 或者 stalled model_call 应该被自动检测并 abort

实际日志摘录

stalled session: sessionId=bbe85782-7c56-4ea6-bfdb-9ab2e2c5b3ab
  state=processing age=175s queueDepth=1
  reason=active_work_without_progress classification=stalled_agent_run
  activeWorkKind=model_call lastProgress=model_call:started lastProgressAge=168s
  recovery=none

still draining 6 active task(s) and 3 active embedded run(s) before restart
...(每 30s 重复,持续 10 分钟)

shutdown started: gateway restarting
[2005b227a854-im-bot] channel stop exceeded 5000ms after abort; continuing shutdown
shutdown completed cleanly in 6310ms

Bug 2: Delivery routing 错误 —— 微信渠道的子代理通知被路由到飞书渠道

问题描述

当微信渠道的 cron run 启动子代理(subagent),子代理失败后需要向 requester 发送完成通知时,delivery 被错误地标记为 channel: feishu,但 to 字段仍然是微信用户 ID。

影响

  • 飞书 API 拒绝发送,返回 HTTP 400:feishu_code: 99992360
  • 错误信息:Invalid ids: [[email protected]]
  • Gateway restart 后 delivery recovery 尝试重试,20 次全部失败
  • 这些 pending delivery 可能也是 Bug 1 中 draining 无法完成的 active tasks 之一

实际数据

~/.openclaw/delivery-queue/ 下 20 个 pending delivery 文件全部如下:

{
  "channel": "feishu",
  "to": "[email protected]",
  "accountId": "2005b227a854-im-bot",
  "agentId": "main"
}

requesterOrigin(正确来源)

{
  "channel": "openclaw-weixin",
  "to": "[email protected]",
  "accountId": "2005b227a854-im-bot"
}

预期行为

子代理的完成通知应该通过 openclaw-weixin 渠道发送回微信用户,而不是 feishu


可能的关联

Bug 2 中 20 个 pending delivery recovery tasks 可能是 Bug 1 中 "4 active task(s) and 2 active embedded run(s)" 的一部分。delivery recovery 任务在等待飞书 API 响应(或重试 backoff),而飞书 API 永远不会成功,导致这些 tasks 变成僵尸任务。

配置背景

  • openclaw.jsonmain agent 同时绑定了 openclaw-weixin(accountId: 2005b227a854-im-bot)和 feishu(accountId: *)
  • 子代理的 requesterOrigin 正确记录了微信渠道来源

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Gateway draining deadlocked by stalled model_call; subagent delivery routed to wrong channel [1 comments, 2 participants]