openclaw - 💡(How to fix) Fix [Bug]: Discord agent session remains routable after timeout, causing partial-success plus generic failure [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72810Fetched 2026-04-28 06:31:59
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Participants

A Discord-routed agent turn can complete useful side effects, then remain stuck in processing until the CLI timeout fires. OpenClaw then surfaces the generic user-facing failure message even though the work may already have been posted/applied. Later routing can still appear to target the wedged session, which makes verifier/worker state ambiguous and can trigger redundant follow-up dispatches.

This may be related to the claude-cli regression tracked in #72434, but the problematic behavior here is the session-health/routing outcome after a terminal timeout.

Error Message

  1. A Discord agent turn starts and performs useful side effects. In the observed case, a review/verdict message and a local state update were successfully recorded.
  2. The same session remains in processing and is reported as stuck for several minutes.
  3. At the 900s CLI timeout, OpenClaw terminates the candidate and posts/surfaces the generic failure text:

Root Cause

  1. Follow-up routing is ambiguous: the agent looks like it can still receive work, but the session is effectively dead/wedged. A later verification had to be recovered from a separate route, while a redundant follow-up dispatch was created because the original verifier path looked silent.

Fix Action

Fix / Workaround

A Discord-routed agent turn can complete useful side effects, then remain stuck in processing until the CLI timeout fires. OpenClaw then surfaces the generic user-facing failure message even though the work may already have been posted/applied. Later routing can still appear to target the wedged session, which makes verifier/worker state ambiguous and can trigger redundant follow-up dispatches.

  1. Follow-up routing is ambiguous: the agent looks like it can still receive work, but the session is effectively dead/wedged. A later verification had to be recovered from a separate route, while a redundant follow-up dispatch was created because the original verifier path looked silent.
  • Users cannot tell whether the work failed or succeeded.
  • Verifier/worker workflows can create duplicate dispatches because the original route appears silent.
  • A watchdog sees processing for many minutes but the user-facing chat only gets a generic failure at the end.
  • The recovery path becomes manual: inspect logs/state, identify whether side effects completed, and route a fresh verifier/session by hand.

Code Example

Something went wrong while processing your request. Please try again, or use /new to start a fresh session.

---

[diagnostic] lane task error: lane=session:agent:<agent>:discord:channel:<redacted>:active-memory:<redacted> durationMs=<small> error="Error: Requested agent harness "claude-cli" is not registered and PI fallback is disabled."
[diagnostic] stuck session: sessionId=unknown sessionKey=agent:<agent>:discord:channel:<redacted> state=processing age=<minutes>s queueDepth=1
[model-fallback/decision] model fallback decision: decision=candidate_failed requested=anthropic/claude-opus-4-7 candidate=anthropic/claude-opus-4-7 reason=timeout next=none detail=CLI exceeded timeout (900s) and was terminated.
Embedded agent failed before reply: CLI exceeded timeout (900s) and was terminated.
RAW_BUFFERClick to expand / collapse

Summary

A Discord-routed agent turn can complete useful side effects, then remain stuck in processing until the CLI timeout fires. OpenClaw then surfaces the generic user-facing failure message even though the work may already have been posted/applied. Later routing can still appear to target the wedged session, which makes verifier/worker state ambiguous and can trigger redundant follow-up dispatches.

This may be related to the claude-cli regression tracked in #72434, but the problematic behavior here is the session-health/routing outcome after a terminal timeout.

Environment

  • OpenClaw: 2026.4.24 from npm stable
  • Channel: Discord
  • Agent model: anthropic/claude-opus-4-7 via the Claude CLI-backed path
  • OS: macOS

Observed behavior

  1. A Discord agent turn starts and performs useful side effects. In the observed case, a review/verdict message and a local state update were successfully recorded.
  2. The same session remains in processing and is reported as stuck for several minutes.
  3. At the 900s CLI timeout, OpenClaw terminates the candidate and posts/surfaces the generic failure text:
Something went wrong while processing your request. Please try again, or use /new to start a fresh session.
  1. Follow-up routing is ambiguous: the agent looks like it can still receive work, but the session is effectively dead/wedged. A later verification had to be recovered from a separate route, while a redundant follow-up dispatch was created because the original verifier path looked silent.

Sanitized log shape

[diagnostic] lane task error: lane=session:agent:<agent>:discord:channel:<redacted>:active-memory:<redacted> durationMs=<small> error="Error: Requested agent harness "claude-cli" is not registered and PI fallback is disabled."
[diagnostic] stuck session: sessionId=unknown sessionKey=agent:<agent>:discord:channel:<redacted> state=processing age=<minutes>s queueDepth=1
[model-fallback/decision] model fallback decision: decision=candidate_failed requested=anthropic/claude-opus-4-7 candidate=anthropic/claude-opus-4-7 reason=timeout next=none detail=CLI exceeded timeout (900s) and was terminated.
Embedded agent failed before reply: CLI exceeded timeout (900s) and was terminated.

Expected behavior

After a fatal timeout or pre-reply embedded-agent failure, OpenClaw should make the session health unambiguous. Any of these would be safer than silently continuing to route to the wedged session:

  • mark the session failed/dead and require /new,
  • automatically reset/roll the session before accepting more work,
  • route the next turn to a fresh session,
  • or surface a clear session timed out; previous side effects may have completed state instead of only the generic failure message.

If side effects completed before the final timeout, the user-facing state should distinguish partial-success/late-failure from total failure.

Impact

  • Users cannot tell whether the work failed or succeeded.
  • Verifier/worker workflows can create duplicate dispatches because the original route appears silent.
  • A watchdog sees processing for many minutes but the user-facing chat only gets a generic failure at the end.
  • The recovery path becomes manual: inspect logs/state, identify whether side effects completed, and route a fresh verifier/session by hand.

Redaction note

This report intentionally redacts Discord IDs, session IDs, dispatch IDs, local paths, project names, internal agent nicknames, and exact local timestamps. The included log snippets preserve only the error shape needed to diagnose the runtime behavior.

extent analysis

TL;DR

The most likely fix is to modify OpenClaw to handle session health unambiguously after a fatal timeout or pre-reply embedded-agent failure.

Guidance

  • Investigate the claude-cli regression tracked in #72434 to see if it's related to the problematic behavior.
  • Consider modifying OpenClaw to mark the session as failed/dead and require /new after a fatal timeout.
  • Review the log snippets to understand the error shape and identify potential areas for improvement in handling session health.
  • Evaluate the possibility of surfacing a clear session timed out; previous side effects may have completed state instead of the generic failure message.

Example

No code snippet is provided as the issue lacks specific technical details.

Notes

The issue is complex and may require a deeper understanding of the OpenClaw and claude-cli interactions. The provided log snippets and error messages are redacted, which may limit the ability to provide a complete solution.

Recommendation

Apply a workaround to modify OpenClaw's handling of session health after a fatal timeout, as the root cause of the issue is not entirely clear. This will help to mitigate the ambiguous session state and prevent duplicate dispatches.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

After a fatal timeout or pre-reply embedded-agent failure, OpenClaw should make the session health unambiguous. Any of these would be safer than silently continuing to route to the wedged session:

  • mark the session failed/dead and require /new,
  • automatically reset/roll the session before accepting more work,
  • route the next turn to a fresh session,
  • or surface a clear session timed out; previous side effects may have completed state instead of only the generic failure message.

If side effects completed before the final timeout, the user-facing state should distinguish partial-success/late-failure from total failure.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Discord agent session remains routable after timeout, causing partial-success plus generic failure [1 participants]