openclaw - 💡(How to fix) Fix Expose model stream idle-timeout as user config (currently ~120s, hardcoded) [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#78361Fetched 2026-05-07 03:37:51
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
4
Timeline (top)
commented ×1cross-referenced ×1subscribed ×1

Error Message

session.ended status=error

Root Cause

Failed runs report no usage and zero output tokens — fully silent for the entire idle window. Same pattern occurs in cron jobs: an isolated Lawzane outreach cron yesterday ran for ~6.8h on a single Gemini call before the wall-clock timeout fired (in that run, idleTimedOut did not trigger because thinking-token heartbeat-like data kept the stream "active" but no useful content was emitted — different but related failure).

Fix Action

Fix / Workaround

Workaround for users hitting this

Code Example

T+0      session.started, prompt.submitted (input ~25k tokens)
T+~120s  model.completed { aborted: true, idleTimedOut: true, timedOut: true,
                            externalAbort: false, usage: null }
         session.ended status=error
T+~1s    auto-fallback to next model in agents.defaults.model.fallbacks
T+~120s  same idleTimedOut
T+~1s    next fallback
T+~120s  same
=> turn final-fails after ~6 minutes
RAW_BUFFERClick to expand / collapse

Background

OpenClaw aborts a model call when its stream emits no bytes for ~120s (idleTimedOut: true in trajectory). For most providers this is appropriate dead-stream protection, but for Google Gemini preview models we observe legitimate completions that are silent on the wire for >120s — likely server-side buffering of thinking tokens — so the watchdog kills them prematurely.

The threshold is not user-configurable in this build:

  • Not in openclaw config schema under agents.defaults (or anywhere I could find).
  • No matching env var (OPENCLAW_*_IDLE* only covers OpenAI WS pool, browser/CDP, etc.).
  • Plumbed only via runtime-internal idleTimeoutMs parameters / session-binding lifecycle.

Result: users hit unrecoverable stalls with no escape hatch.

Reproduction (2026.5.4-beta.1, commit 9cc3ae1)

Provider: google (google-generative-ai), model: gemini-3.1-pro-preview-customtools (and other Gemini -preview variants — pro / flash / flash-lite all reproduce).

User asks main agent for a moderate-output reply ("List the 46…"). Trajectory shows:

T+0      session.started, prompt.submitted (input ~25k tokens)
T+~120s  model.completed { aborted: true, idleTimedOut: true, timedOut: true,
                            externalAbort: false, usage: null }
         session.ended status=error
T+~1s    auto-fallback to next model in agents.defaults.model.fallbacks
T+~120s  same idleTimedOut
T+~1s    next fallback
T+~120s  same
=> turn final-fails after ~6 minutes

Failed runs report no usage and zero output tokens — fully silent for the entire idle window. Same pattern occurs in cron jobs: an isolated Lawzane outreach cron yesterday ran for ~6.8h on a single Gemini call before the wall-clock timeout fired (in that run, idleTimedOut did not trigger because thinking-token heartbeat-like data kept the stream "active" but no useful content was emitted — different but related failure).

Setting agents.defaults.thinkingDefault to medium (instead of adaptive) did not eliminate the stalls. We are testing thinking: off next.

Request

Expose the model stream idle-timeout as a user-configurable setting:

  1. Primary: agents.defaults.idleTimeoutMs — global default in openclaw.json, settable via openclaw config set. Default current behavior (120000?), users can raise to e.g. 600000 for tolerant slow-stream providers.
  2. Per-agent override: agents.list[<id>].idleTimeoutMs.
  3. Per-cron-job override: --idle-timeout-ms flag on openclaw cron edit, parallel to existing --timeout-seconds.
  4. (Optional but valuable): 0 or null to disable the idle watchdog entirely (rely solely on wall-clock timeoutSeconds).

Why expose rather than just bump the default

Different providers have different reasonable silence windows. OpenAI / Anthropic streams reasoning tokens visibly; Gemini preview buffers them. A one-size-fits-all bump trades reliability against cost-on-dead-streams. Per-agent / per-cron tuning is the right ergonomic.

Workaround for users hitting this

  • Trim agents.defaults.model.fallbacks to [] so a stall fails in ~2 min instead of cycling for ~6 min.
  • Move chat / sensitive workloads off Gemini preview models to a non-Google provider (anthropic, openai).
  • For mechanical cron jobs, set thinking: off on the job to avoid the thinking-buffer pathology.

Related

  • #78258 (Google Gemini flex service tier) — same project area; flex tier was the first symptom we hit, this is the second.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING