When a Discord agent turn reaches a terminal model-call timeout/stall, OpenClaw should do one of the following without downstream integrations inspecting private runtime internals: 1. clear the `processing` state and unblock later messages in that session; 2. append a durable terminal assistant/system event to the session transcript; 3. send a channel-visible failure/retry message to the originating Discord target; and 4. expose a structured recovery outcome through CLI/API/logs. The timeout layers should also be coherent: if `models.providers[...].timeoutSeconds = 600` and/or `agents.defaults.timeoutSeconds = 1800` is configured, the `/backend-api/codex/responses` fetch path should not be hard-capped at 180s unless there is an explicit, documented lower-priority transport timeout.

openclaw - 💡(How to fix) Fix Discord agent turn can stall in processing with recovery=none and no visible reply

openclaw2026-05-24 06:03:59

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

A Discord-triggered agent turn can enter state=processing with activeWorkKind=model_call, repeatedly log stalled session ... recovery=none, and never surface a channel-visible failure or recovery reply. This leaves the Discord user with no response even though the gateway, Discord account, and outbound send path are healthy.

This looks like an OpenClaw runtime/session recovery responsibility rather than a downstream integration issue: OpenClaw owns the agent session state, model-call timeout handling, stalled-session recovery, and Discord delivery of terminal turn failures.

Error Message

For session 0dc1bab4-e176-4274-b9a4-6a73f5b2f6d2, session key agent:main:discord:channel:1507758768417411112, OpenClaw accepted a Discord message and submitted a model call. The session file contains the user message but no later assistant message.

Root Cause

Code Example

[diagnostic] stalled session: sessionId=0dc1bab4-e176-4274-b9a4-6a73f5b2f6d2 sessionKey=agent:main:discord:channel:1507758768417411112 state=processing age=... queueDepth=1 reason=active_work_without_progress classification=stalled_agent_run activeWorkKind=model_call lastProgress=model_call:started lastProgressAge=... recovery=none

---

[fetch-timeout] fetch timeout after 180000ms (elapsed 180871ms) operation=fetchWithSsrFGuard url=http://10.0.0.1:3003/backend-api/codex/responses
[openai-transport] [responses] error provider=openai-codex api=openai-codex-responses model=gpt-5.4 name=TimeoutError message=request timed out

RAW_BUFFERClick to expand / collapse

Summary

Environment

OpenClaw gateway: 2026.5.18, gitSha 137405d
Channel: Discord
Model route: openai-codex/gpt-5.4
Model API: openai-codex-responses
Provider base URL: local Codex proxy at /backend-api/codex
Configured provider timeout: timeoutSeconds: 600
Agent default timeout in trajectory config: timeoutSeconds: 1800

Observed behavior

Container logs then repeatedly reported:

[diagnostic] stalled session: sessionId=0dc1bab4-e176-4274-b9a4-6a73f5b2f6d2 sessionKey=agent:main:discord:channel:1507758768417411112 state=processing age=... queueDepth=1 reason=active_work_without_progress classification=stalled_agent_run activeWorkKind=model_call lastProgress=model_call:started lastProgressAge=... recovery=none

The same runtime also logged a hard provider fetch timeout at 180s, despite the OpenClaw config containing longer timeout values:

[fetch-timeout] fetch timeout after 180000ms (elapsed 180871ms) operation=fetchWithSsrFGuard url=http://10.0.0.1:3003/backend-api/codex/responses
[openai-transport] [responses] error provider=openai-codex api=openai-codex-responses model=gpt-5.4 name=TimeoutError message=request timed out

A separate smoke test in the same Discord thread proved inbound and outbound Discord plumbing was alive: a nonce-bound no-tools request received the expected exact reply. So this is not a global Discord send failure.

Expected behavior

When a Discord agent turn reaches a terminal model-call timeout/stall, OpenClaw should do one of the following without downstream integrations inspecting private runtime internals:

clear the processing state and unblock later messages in that session;
append a durable terminal assistant/system event to the session transcript;
send a channel-visible failure/retry message to the originating Discord target; and
expose a structured recovery outcome through CLI/API/logs.

The timeout layers should also be coherent: if models.providers[...].timeoutSeconds = 600 and/or agents.defaults.timeoutSeconds = 1800 is configured, the /backend-api/codex/responses fetch path should not be hard-capped at 180s unless there is an explicit, documented lower-priority transport timeout.

Why downstream cannot fully solve this

A downstream integration can post a watchdog/fallback, but it cannot safely own OpenClaw's session lock, transcript state, model-call cancellation, queue draining, or delivery semantics. Downstream recovery also risks conflicting with OpenClaw if it mutates session state directly. The clean contract should be OpenClaw-owned stalled-turn recovery plus downstream observation.

Related evidence

This resembles the user-visible gap in #78609: Discord connected and manual send worked, but normal agent replies silently stalled with state=processing; errors were only visible by combining gateway logs and session inspection.

It also resembles the multi-timeout-layer behavior described in #63805: a configured timeout can be bypassed by another timeout layer in the execution path.

<details> <summary>Posted on behalf of @schickling</summary>

field	value
`agent_name`	🐙 co3-atoll
`agent_session_id`	8a3a03ff-dfe1-43c3-aa30-df87447619cd
`agent_tool`	Codex CLI
`agent_tool_version`	0.131.0
`agent_runtime`	Codex CLI 0.131.0
`agent_model`	unknown
`worktree`	dotfiles-pr920-molty-rca
`machine`	dev3
`tooling_profile`	dotfiles@4db6783

</details>

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

When a Discord agent turn reaches a terminal model-call timeout/stall, OpenClaw should do one of the following without downstream integrations inspecting private runtime internals:

clear the processing state and unblock later messages in that session;
append a durable terminal assistant/system event to the session transcript;
send a channel-visible failure/retry message to the originating Discord target; and
expose a structured recovery outcome through CLI/API/logs.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Discord agent turn can stall in processing with recovery=none and no visible reply

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Summary

Environment

Observed behavior

Expected behavior

Why downstream cannot fully solve this

Related evidence

FAQ

Expected behavior

Still need to ship something?

TRENDING