After a fatal timeout or pre-reply embedded-agent failure, OpenClaw should make the session health unambiguous. Any of these would be safer than silently continuing to route to the wedged session: - mark the session failed/dead and require `/new`, - automatically reset/roll the session before accepting more work, - route the next turn to a fresh session, - or surface a clear `session timed out; previous side effects may have completed` state instead of only the generic failure message. If side effects completed before the final timeout, the user-facing state should distinguish partial-success/late-failure from total failure.

openclaw - 💡(How to fix) Fix [Bug]: Discord agent session remains routable after timeout, causing partial-success plus generic failure [1 participants]

vishutdhar · 2026-04-27T12:57:49Z

[openclaw] A Discord-routed agent turn can complete useful side effects, then remain stuck in processing until the CLI timeout fires. OpenClaw then surfaces th… A Discord-routed agent turn can complete useful side effects, then remain stuck in `processing` until the CLI timeout fires. OpenClaw then surfaces the generic user-facing failure message even though the work may already have been posted/applied. Later routing can still appear to target the wedged session, which makes verifier/worker state ambiguous and can trigger redundant follow-up dispatches. This may be related to the `claude-cli` regression tracked in #72434, but the problematic behavior here is the session-health/routing outcome after a terminal timeout. ## Fix / Workaround A Discord-routed agent turn can complete useful side effects, then remain stuck in `processing` until the CLI timeout fires. OpenClaw then surfaces the generic user-facing failure message even though the work may already have been posted/applied. Later routing can still appear to target the wedged session, which makes verifier/worker state ambiguous and can trigger redundant follow-up dispatches. 4. Follow-up routing is ambiguous: the agent looks like it can still receive work, but the session is effectively dead/wedged. A later verification had to be recovered from a separate route, while a redundant follow-up dispatch was created because the original verifier path looked silent. - Users cannot tell whether the work failed or succeeded. - Verifier/worker workflows can create duplicate dispatches because the original route appears silent. - A watchdog sees `processing` for many minutes but the user-facing chat only gets a generic failure at the end. - The recovery path becomes manual: inspect logs/state, identify whether side effects completed, and route a fresh verifier/session by hand. ### Summary A Discord-routed agent turn can complete useful side effects, then remain stuck in `processing` until the CLI timeout fires. OpenClaw then surfaces the generic user-facing failure message even though the work may already have been posted/applied. Later routing can still appear to target the wedged session, which makes verifier/worker state ambiguous and can trigger redundant follow-up dispatches. This may be related to the `claude-cli` regression tracked in #72434, but the problematic behavior here is the session-health/routing outcome after a terminal timeout. ### Environment - OpenClaw: `2026.4.24` from npm stable - Channel: Discord - Agent model: `anthropic/claude-opus-4-7` via the Claude CLI-backed path - OS: macOS ### Observed behavior 1. A Discord agent turn starts and performs useful side effects. In the observed case, a review/verdict message and a local state update were successfully recorded. 2. The same session remains in `processing` and is reported as stuck for several minutes. 3. At the 900s CLI timeout, OpenClaw terminates the candidate and posts/surfaces the generic failure text: ```text Something went wrong while processing your request. Please try again, or use /new to start a fresh session. ``` 4. Follow-up routing is ambiguous: the agent looks like it can still receive work, but the session is effectively dead/wedged. A later verification had to be recovered from a separate route, while a redundant follow-up dispatch was created because the original verifier path looked silent. ### Sanitized log shape ```text [diagnostic] lane task error: lane=session:agent: :discord:channel: :active-memory: durationMs= error="Error: Requested agent harness "claude-cli" is not registered and PI fallback is disabled." [diagnostic] stuck session: sessionId=unknown sessionKey=agent: :discord:channel: state=processing age= s queueDepth=1 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=anthropic/claude-opus-4-7 candidate=anthropic/claude-opus-4-7 reason=timeout next=none detail=CLI exceeded timeout (900s) and was terminated. Embedded agent failed before reply: CLI exceeded timeout (900s) and was terminated. ``` ### Expected behavior After a fatal timeout or pre-reply embedded-agent failure, OpenClaw should make the session health unambiguous. Any of these would be safer than silently continuing to route to the wedged session: - mark the session failed/dead and require `/new`, - automatically reset/roll the session before accepting more work, - route the next turn to a fresh session, - or surface a clear `session timed out; previous side effects may have completed` state instead of only the generic failure message. If side effects completed before the final timeout, the user-facing state should distinguish partial-success/late-failure from total failure. ### Impact - Users cannot tell whether the work failed or succeeded. - Verifier/worker workflows can create duplicate dispatches because the original route appears silent. - A watchdog sees `processing` for many minutes but the user-faci

openclaw2026-04-27 12:57:49

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#72810•Fetched 2026-04-28 06:31:59

View on GitHub

Comments

Participants

Timeline

Reactions

Author

vishutdhar

Participants

vishutdhar

A Discord-routed agent turn can complete useful side effects, then remain stuck in processing until the CLI timeout fires. OpenClaw then surfaces the generic user-facing failure message even though the work may already have been posted/applied. Later routing can still appear to target the wedged session, which makes verifier/worker state ambiguous and can trigger redundant follow-up dispatches.

This may be related to the claude-cli regression tracked in #72434, but the problematic behavior here is the session-health/routing outcome after a terminal timeout.

Error Message

A Discord agent turn starts and performs useful side effects. In the observed case, a review/verdict message and a local state update were successfully recorded.
The same session remains in processing and is reported as stuck for several minutes.
At the 900s CLI timeout, OpenClaw terminates the candidate and posts/surfaces the generic failure text:

Root Cause

Follow-up routing is ambiguous: the agent looks like it can still receive work, but the session is effectively dead/wedged. A later verification had to be recovered from a separate route, while a redundant follow-up dispatch was created because the original verifier path looked silent.

Fix Action

Fix / Workaround

Follow-up routing is ambiguous: the agent looks like it can still receive work, but the session is effectively dead/wedged. A later verification had to be recovered from a separate route, while a redundant follow-up dispatch was created because the original verifier path looked silent.

Users cannot tell whether the work failed or succeeded.
Verifier/worker workflows can create duplicate dispatches because the original route appears silent.
A watchdog sees processing for many minutes but the user-facing chat only gets a generic failure at the end.
The recovery path becomes manual: inspect logs/state, identify whether side effects completed, and route a fresh verifier/session by hand.

Code Example

Something went wrong while processing your request. Please try again, or use /new to start a fresh session.

---

[diagnostic] lane task error: lane=session:agent:<agent>:discord:channel:<redacted>:active-memory:<redacted> durationMs=<small> error="Error: Requested agent harness "claude-cli" is not registered and PI fallback is disabled."
[diagnostic] stuck session: sessionId=unknown sessionKey=agent:<agent>:discord:channel:<redacted> state=processing age=<minutes>s queueDepth=1
[model-fallback/decision] model fallback decision: decision=candidate_failed requested=anthropic/claude-opus-4-7 candidate=anthropic/claude-opus-4-7 reason=timeout next=none detail=CLI exceeded timeout (900s) and was terminated.
Embedded agent failed before reply: CLI exceeded timeout (900s) and was terminated.

RAW_BUFFERClick to expand / collapse

Summary

This may be related to the claude-cli regression tracked in #72434, but the problematic behavior here is the session-health/routing outcome after a terminal timeout.

Environment

OpenClaw: 2026.4.24 from npm stable
Channel: Discord
Agent model: anthropic/claude-opus-4-7 via the Claude CLI-backed path
OS: macOS

Observed behavior

A Discord agent turn starts and performs useful side effects. In the observed case, a review/verdict message and a local state update were successfully recorded.
The same session remains in processing and is reported as stuck for several minutes.
At the 900s CLI timeout, OpenClaw terminates the candidate and posts/surfaces the generic failure text:

Something went wrong while processing your request. Please try again, or use /new to start a fresh session.

Follow-up routing is ambiguous: the agent looks like it can still receive work, but the session is effectively dead/wedged. A later verification had to be recovered from a separate route, while a redundant follow-up dispatch was created because the original verifier path looked silent.

Sanitized log shape

[diagnostic] lane task error: lane=session:agent:<agent>:discord:channel:<redacted>:active-memory:<redacted> durationMs=<small> error="Error: Requested agent harness "claude-cli" is not registered and PI fallback is disabled."
[diagnostic] stuck session: sessionId=unknown sessionKey=agent:<agent>:discord:channel:<redacted> state=processing age=<minutes>s queueDepth=1
[model-fallback/decision] model fallback decision: decision=candidate_failed requested=anthropic/claude-opus-4-7 candidate=anthropic/claude-opus-4-7 reason=timeout next=none detail=CLI exceeded timeout (900s) and was terminated.
Embedded agent failed before reply: CLI exceeded timeout (900s) and was terminated.

Expected behavior

After a fatal timeout or pre-reply embedded-agent failure, OpenClaw should make the session health unambiguous. Any of these would be safer than silently continuing to route to the wedged session:

mark the session failed/dead and require /new,
automatically reset/roll the session before accepting more work,
route the next turn to a fresh session,
or surface a clear session timed out; previous side effects may have completed state instead of only the generic failure message.

If side effects completed before the final timeout, the user-facing state should distinguish partial-success/late-failure from total failure.

Impact

Users cannot tell whether the work failed or succeeded.
Verifier/worker workflows can create duplicate dispatches because the original route appears silent.
A watchdog sees processing for many minutes but the user-facing chat only gets a generic failure at the end.
The recovery path becomes manual: inspect logs/state, identify whether side effects completed, and route a fresh verifier/session by hand.

Redaction note

This report intentionally redacts Discord IDs, session IDs, dispatch IDs, local paths, project names, internal agent nicknames, and exact local timestamps. The included log snippets preserve only the error shape needed to diagnose the runtime behavior.

extent analysis

TL;DR

The most likely fix is to modify OpenClaw to handle session health unambiguously after a fatal timeout or pre-reply embedded-agent failure.

Guidance

Investigate the claude-cli regression tracked in #72434 to see if it's related to the problematic behavior.
Consider modifying OpenClaw to mark the session as failed/dead and require /new after a fatal timeout.
Review the log snippets to understand the error shape and identify potential areas for improvement in handling session health.
Evaluate the possibility of surfacing a clear session timed out; previous side effects may have completed state instead of the generic failure message.

Example

No code snippet is provided as the issue lacks specific technical details.

Notes

The issue is complex and may require a deeper understanding of the OpenClaw and claude-cli interactions. The provided log snippets and error messages are redacted, which may limit the ability to provide a complete solution.

Recommendation

Apply a workaround to modify OpenClaw's handling of session health after a fatal timeout, as the root cause of the issue is not entirely clear. This will help to mitigate the ambiguous session state and prevent duplicate dispatches.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

After a fatal timeout or pre-reply embedded-agent failure, OpenClaw should make the session health unambiguous. Any of these would be safer than silently continuing to route to the wedged session:

mark the session failed/dead and require /new,
automatically reset/roll the session before accepting more work,
route the next turn to a fresh session,
or surface a clear session timed out; previous side effects may have completed state instead of only the generic failure message.

If side effects completed before the final timeout, the user-facing state should distinguish partial-success/late-failure from total failure.

#serialization error #model compatibility #GPU setup #container setup #orchestration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Bug]: Discord agent session remains routable after timeout, causing partial-success plus generic failure [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Summary

Environment

Observed behavior

Sanitized log shape

Expected behavior

Impact

Redaction note

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Discord agent session remains routable after timeout, causing partial-success plus generic failure [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Summary

Environment

Observed behavior

Sanitized log shape

Expected behavior

Impact

Redaction note

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING