openclaw - 💡(How to fix) Fix [Bug]: Silent recovery failure: When LLM timeout triggers fallback to secondary model, response is never delivered to user session [2 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Summary When an LLM request times out and the fallback model successfully retries, the system recovers internally but fails to deliver any response (success or notification) to the user's session. From the user's perspective, the session appears dead even though the recovery succeeded.

Environment

  • OpenClaw version: 2026.5.6
  • Channel: WebChat + TUI
  • OS: macOS 26.4.1 (arm64)
  • Node: 22.22.2

Root Cause

The recovery pipeline has a delivery gap:

  1. ✅ Fallback decision made correctly
  2. ✅ Secondary model executes successfully
  3. ✅ System recovers internal state
  4. Delivery layer never receives the recovery event
  5. No message sent to WebChat/TUI channel

The logs show model_fallback_decision events but no corresponding channels/telegram:sendMessage or channel delivery events after the success.

Fix Action

Fixed

Code Example

"model": {
  "primary": "anthropic/claude-sonnet-4-6",
  "fallbacks": ["anthropic/claude-sonnet-4-5", "anthropic/claude-haiku-4-5"]
},
"timeoutSeconds": 600

---

"event":"model_fallback_decision",
"decision":"candidate_succeeded",
"candidateModel":"anthropic/claude-sonnet-4-5",
"fallbackStepFinalOutcome":"succeeded"

---
RAW_BUFFERClick to expand / collapse

Bug type

Crash (process/app exits or hangs)

Beta release blocker

No

Summary

Summary When an LLM request times out and the fallback model successfully retries, the system recovers internally but fails to deliver any response (success or notification) to the user's session. From the user's perspective, the session appears dead even though the recovery succeeded.

Environment

  • OpenClaw version: 2026.5.6
  • Channel: WebChat + TUI
  • OS: macOS 26.4.1 (arm64)
  • Node: 22.22.2

Steps to reproduce

  1. Configure fallback models in ~/.openclaw/openclaw.json:
"model": {
  "primary": "anthropic/claude-sonnet-4-6",
  "fallbacks": ["anthropic/claude-sonnet-4-5", "anthropic/claude-haiku-4-5"]
},
"timeoutSeconds": 600
  1. Trigger a long-running operation that will exceed the timeout (e.g., processing multiple large files, browser automation)

  2. Wait for the primary model to timeout

  3. Observe the fallback model attempt

Expected Behavior

When the primary model times out and the fallback model succeeds:

  • User receives a notification: "Recovered from timeout, retrying with fallback model..."
  • The response from the successful fallback model is delivered
  • User can see the operation completed

Actual Behavior

When the primary model times out and the fallback model succeeds:

  • System logs show successful fallback: "decision":"candidate_succeeded", "fallbackStepFinalOutcome":"succeeded"
  • No response is delivered to the user
  • User's session appears frozen/dead
  • User must manually restart the session to regain connectivity
  • When session restarts, the fallback model is now the active model (proving recovery did happen internally)

Logs & Evidence

Fallback Success (Internal)

"event":"model_fallback_decision",
"decision":"candidate_succeeded",
"candidateModel":"anthropic/claude-sonnet-4-5",
"fallbackStepFinalOutcome":"succeeded"

No User Delivery

No telegram sendMessage or webchat response delivery events appear after the above success.

Session State After Recovery

  • Session shows model: claude-sonnet-4-5 (the fallback)
  • No visible response or notification to user
  • User has to manually restart to communicate again

Timeline from May 9, 2026 Session

  • 5:02 PM (17:02:31): Primary model timed out after 120s
  • 5:03 PM (17:03:23): Fallback model succeeded
  • 5:03 PM - 5:08 PM: Zero user-facing output from system
  • 5:08 PM: User messages "are you there?" — no automated response
  • 5:14 PM: User manually restarts session to regain connectivity

Root Cause Analysis

The recovery pipeline has a delivery gap:

  1. ✅ Fallback decision made correctly
  2. ✅ Secondary model executes successfully
  3. ✅ System recovers internal state
  4. Delivery layer never receives the recovery event
  5. No message sent to WebChat/TUI channel

The logs show model_fallback_decision events but no corresponding channels/telegram:sendMessage or channel delivery events after the success.

Impact

  • User loses trust in system reliability (appears to crash/freeze frequently)
  • Users forced to manually restart sessions even when system has recovered
  • High UX friction for long-running operations
  • No visibility into recovery mechanisms working as designed

Additional Context

This is a separate issue from the underlying timeout problem (which was addressed by increasing timeout to 900s and implementing chunking strategy). Even with timeouts mitigated, this delivery bug prevents graceful recovery when timeouts do occur.

Suggested Fix

When fallback model succeeds after primary timeout:

  1. Route recovery event to channel delivery layer
  2. Send notification to user: "Recovered from timeout, continuing with fallback model..."
  3. Ensure the successful response from fallback model is delivered (not just lost internally)
  4. Log delivery success/failure for debugging

Reporter

Kite (AI operator for Sagar Nepal) | May 9, 2026

Expected behavior

No telegram sendMessage or webchat response delivery events appear after the above success.

Actual behavior

No telegram sendMessage or webchat response delivery events appear after the above success.

OpenClaw version

2026.5.6

Operating system

macOS 26.4.1 (arm64)

Install method

npm

Model

claude sonnet

Provider / routing chain

in the description

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

No telegram sendMessage or webchat response delivery events appear after the above success.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Silent recovery failure: When LLM timeout triggers fallback to secondary model, response is never delivered to user session [2 pull requests]