openclaw - 💡(How to fix) Fix Discord agent turn can stall in processing with recovery=none and no visible reply

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

A Discord-triggered agent turn can enter state=processing with activeWorkKind=model_call, repeatedly log stalled session ... recovery=none, and never surface a channel-visible failure or recovery reply. This leaves the Discord user with no response even though the gateway, Discord account, and outbound send path are healthy.

This looks like an OpenClaw runtime/session recovery responsibility rather than a downstream integration issue: OpenClaw owns the agent session state, model-call timeout handling, stalled-session recovery, and Discord delivery of terminal turn failures.

Error Message

For session 0dc1bab4-e176-4274-b9a4-6a73f5b2f6d2, session key agent:main:discord:channel:1507758768417411112, OpenClaw accepted a Discord message and submitted a model call. The session file contains the user message but no later assistant message.

Root Cause

A Discord-triggered agent turn can enter state=processing with activeWorkKind=model_call, repeatedly log stalled session ... recovery=none, and never surface a channel-visible failure or recovery reply. This leaves the Discord user with no response even though the gateway, Discord account, and outbound send path are healthy.

This looks like an OpenClaw runtime/session recovery responsibility rather than a downstream integration issue: OpenClaw owns the agent session state, model-call timeout handling, stalled-session recovery, and Discord delivery of terminal turn failures.

Code Example

[diagnostic] stalled session: sessionId=0dc1bab4-e176-4274-b9a4-6a73f5b2f6d2 sessionKey=agent:main:discord:channel:1507758768417411112 state=processing age=... queueDepth=1 reason=active_work_without_progress classification=stalled_agent_run activeWorkKind=model_call lastProgress=model_call:started lastProgressAge=... recovery=none

---

[fetch-timeout] fetch timeout after 180000ms (elapsed 180871ms) operation=fetchWithSsrFGuard url=http://10.0.0.1:3003/backend-api/codex/responses
[openai-transport] [responses] error provider=openai-codex api=openai-codex-responses model=gpt-5.4 name=TimeoutError message=request timed out
RAW_BUFFERClick to expand / collapse

Summary

A Discord-triggered agent turn can enter state=processing with activeWorkKind=model_call, repeatedly log stalled session ... recovery=none, and never surface a channel-visible failure or recovery reply. This leaves the Discord user with no response even though the gateway, Discord account, and outbound send path are healthy.

This looks like an OpenClaw runtime/session recovery responsibility rather than a downstream integration issue: OpenClaw owns the agent session state, model-call timeout handling, stalled-session recovery, and Discord delivery of terminal turn failures.

Environment

  • OpenClaw gateway: 2026.5.18, gitSha 137405d
  • Channel: Discord
  • Model route: openai-codex/gpt-5.4
  • Model API: openai-codex-responses
  • Provider base URL: local Codex proxy at /backend-api/codex
  • Configured provider timeout: timeoutSeconds: 600
  • Agent default timeout in trajectory config: timeoutSeconds: 1800

Observed behavior

For session 0dc1bab4-e176-4274-b9a4-6a73f5b2f6d2, session key agent:main:discord:channel:1507758768417411112, OpenClaw accepted a Discord message and submitted a model call. The session file contains the user message but no later assistant message.

Container logs then repeatedly reported:

[diagnostic] stalled session: sessionId=0dc1bab4-e176-4274-b9a4-6a73f5b2f6d2 sessionKey=agent:main:discord:channel:1507758768417411112 state=processing age=... queueDepth=1 reason=active_work_without_progress classification=stalled_agent_run activeWorkKind=model_call lastProgress=model_call:started lastProgressAge=... recovery=none

The same runtime also logged a hard provider fetch timeout at 180s, despite the OpenClaw config containing longer timeout values:

[fetch-timeout] fetch timeout after 180000ms (elapsed 180871ms) operation=fetchWithSsrFGuard url=http://10.0.0.1:3003/backend-api/codex/responses
[openai-transport] [responses] error provider=openai-codex api=openai-codex-responses model=gpt-5.4 name=TimeoutError message=request timed out

A separate smoke test in the same Discord thread proved inbound and outbound Discord plumbing was alive: a nonce-bound no-tools request received the expected exact reply. So this is not a global Discord send failure.

Expected behavior

When a Discord agent turn reaches a terminal model-call timeout/stall, OpenClaw should do one of the following without downstream integrations inspecting private runtime internals:

  1. clear the processing state and unblock later messages in that session;
  2. append a durable terminal assistant/system event to the session transcript;
  3. send a channel-visible failure/retry message to the originating Discord target; and
  4. expose a structured recovery outcome through CLI/API/logs.

The timeout layers should also be coherent: if models.providers[...].timeoutSeconds = 600 and/or agents.defaults.timeoutSeconds = 1800 is configured, the /backend-api/codex/responses fetch path should not be hard-capped at 180s unless there is an explicit, documented lower-priority transport timeout.

Why downstream cannot fully solve this

A downstream integration can post a watchdog/fallback, but it cannot safely own OpenClaw's session lock, transcript state, model-call cancellation, queue draining, or delivery semantics. Downstream recovery also risks conflicting with OpenClaw if it mutates session state directly. The clean contract should be OpenClaw-owned stalled-turn recovery plus downstream observation.

Related evidence

This resembles the user-visible gap in #78609: Discord connected and manual send worked, but normal agent replies silently stalled with state=processing; errors were only visible by combining gateway logs and session inspection.

It also resembles the multi-timeout-layer behavior described in #63805: a configured timeout can be bypassed by another timeout layer in the execution path.

<!-- agent-footer:begin v=1 --> <details> <summary>Posted on behalf of @schickling</summary>
fieldvalue
agent_name🐙 co3-atoll
agent_session_id8a3a03ff-dfe1-43c3-aa30-df87447619cd
agent_toolCodex CLI
agent_tool_version0.131.0
agent_runtimeCodex CLI 0.131.0
agent_modelunknown
worktreedotfiles-pr920-molty-rca
machinedev3
tooling_profiledotfiles@4db6783
</details> <!-- agent-footer:end -->

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When a Discord agent turn reaches a terminal model-call timeout/stall, OpenClaw should do one of the following without downstream integrations inspecting private runtime internals:

  1. clear the processing state and unblock later messages in that session;
  2. append a durable terminal assistant/system event to the session transcript;
  3. send a channel-visible failure/retry message to the originating Discord target; and
  4. expose a structured recovery outcome through CLI/API/logs.

The timeout layers should also be coherent: if models.providers[...].timeoutSeconds = 600 and/or agents.defaults.timeoutSeconds = 1800 is configured, the /backend-api/codex/responses fetch path should not be hard-capped at 180s unless there is an explicit, documented lower-priority transport timeout.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING