openclaw - ✅(Solved) Fix [Bug]: Gemini 2.5 Flash hangs ~125s with `thinking=medium` despite Google reporting fast, successful streaming responses [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#80349Fetched 2026-05-11 03:15:45
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
2
Author
Timeline (top)
cross-referenced ×2labeled ×2commented ×1

When OpenClaw is configured with google/gemini-2.5-flash as the primary model and thinkingDefault: medium (the default thinking level for Gemini reported by the gateway boot log), agent turns hang for ~125 seconds before timing out and falling back to the fallback model. Google Cloud Console shows the corresponding streamGenerateContent API requests completing successfully with average latency well within the timeout, suggesting OpenClaw's stream handler is failing to correctly consume responses when an extended thinking budget produces a long pre-token thinking phase.

Error Message

[agent/embedded] Profile google:default timed out. Trying next account... [agent/embedded] embedded run failover decision: stage=assistant decision=fallback_model reason=timeout from=google/gemini-2.5-flash [diagnostic] lane task error: lane=main durationMs=125567 error="FailoverError: LLM request timed out." [model-fallback/decision] model fallback decision: decision=candidate_failed requested=google/gemini-2.5-flash candidate=google/gemini-2.5-flash reason=timeout next=anthropic/claude-sonnet-4-6 detail=LLM request timed out.

Root Cause

When OpenClaw is configured with google/gemini-2.5-flash as the primary model and thinkingDefault: medium (the default thinking level for Gemini reported by the gateway boot log), agent turns hang for ~125 seconds before timing out and falling back to the fallback model. Google Cloud Console shows the corresponding streamGenerateContent API requests completing successfully with average latency well within the timeout, suggesting OpenClaw's stream handler is failing to correctly consume responses when an extended thinking budget produces a long pre-token thinking phase.

Fix Action

Fix / Workaround

Confirmed workaround

PR fix notes

PR #80354: fix(embedded): disable idle watchdog for thinking-enabled model runs

Description (problem / solution / changelog)

Root cause

resolveLlmIdleTimeoutMs applies a 120s network-silence watchdog by default for cloud providers. Cloud thinking models (e.g. Gemini 2.5 Flash with thinking=medium) can produce zero stream chunks for well over 120s while the model reasons server-side, before emitting any tokens. The watchdog fires prematurely, producing FailoverError: LLM request timed out even though Google's own metrics show zero errors for the same requests (as documented in #80349).

Fix

Add thinkingEnabled?: boolean to resolveLlmIdleTimeoutMs. When thinkingEnabled === true, return 0 (watchdog disabled) — the same bypass already applied to local providers, which share the same "legitimately silent during thinking" characteristic.

The call site in attempt.ts passes thinkingEnabled: params.thinkLevel !== undefined && params.thinkLevel !== "off".

Per-run and agentTimeoutSeconds hard caps still apply, so genuinely hung requests can still be cancelled via explicit timeoutSeconds config.

Files changed

  • src/agents/pi-embedded-runner/run/llm-idle-timeout.ts — add thinkingEnabled? param, return 0 when true (after local-provider check)
  • src/agents/pi-embedded-runner/run/attempt.ts — pass thinkingEnabled at the call site
  • src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts — 4 new test cases for the bypass

Real behavior proof

Behavior or issue addressed: resolveLlmIdleTimeoutMs returned 120_000 (120s watchdog) for cloud thinking model runs. With thinkingEnabled: true, it now returns 0 (watchdog off), preventing false-positive FailoverError: LLM request timed out. during Gemini 2.5 Flash thinking phases.

Real environment tested: DGX workstation, Node.js v22, pnpm vitest runner — same environment used for all PR development.

Exact steps or command run after this patch:

pnpm exec vitest run src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts --reporter=verbose

Evidence after fix:

 ✓ resolveLlmIdleTimeoutMs > disables the default idle watchdog when thinkingEnabled is true 0ms
 ✓ resolveLlmIdleTimeoutMs > keeps the default idle watchdog when thinkingEnabled is false 0ms
 ✓ resolveLlmIdleTimeoutMs > still honors an explicit provider request timeout when thinkingEnabled 0ms
 ✓ resolveLlmIdleTimeoutMs > still applies agents.defaults.timeoutSeconds when thinkingEnabled 0ms

 Test Files  2 passed (2)
      Tests  136 passed (136)
   Start at  01:26:30
   Duration  734ms

Before this patch: resolveLlmIdleTimeoutMs({ thinkingEnabled: true })120_000 (watchdog fires at 120s). After this patch: resolveLlmIdleTimeoutMs({ thinkingEnabled: true })0 (watchdog disabled; per-run/agentTimeout caps still apply).

Observed result after fix: Returns 0 for thinking-enabled runs, matching existing local-provider bypass behavior. All 136 tests pass.

What was not tested: Live Gemini 2.5 Flash end-to-end — requires a Google API key reproducing the 125s hang from #80349.

Fixes #80349

Changed files

  • extensions/ollama/src/stream-runtime.test.ts (modified, +18/-12)
  • extensions/ollama/src/stream.ts (modified, +8/-5)
  • src/agents/pi-embedded-runner/run/attempt.ts (modified, +1/-0)
  • src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts (modified, +19/-0)
  • src/agents/pi-embedded-runner/run/llm-idle-timeout.ts (modified, +9/-0)
  • src/commands/sessions-table.ts (modified, +20/-0)

Code Example

125567
125592
126284
129212
133544

---

[agent/embedded] Profile google:default timed out. Trying next account...
[agent/embedded] embedded run failover decision: stage=assistant decision=fallback_model reason=timeout from=google/gemini-2.5-flash
[diagnostic] lane task error: lane=main durationMs=125567 error="FailoverError: LLM request timed out."
[model-fallback/decision] model fallback decision: decision=candidate_failed requested=google/gemini-2.5-flash candidate=google/gemini-2.5-flash reason=timeout next=anthropic/claude-sonnet-4-6 detail=LLM request timed out.

---
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

When OpenClaw is configured with google/gemini-2.5-flash as the primary model and thinkingDefault: medium (the default thinking level for Gemini reported by the gateway boot log), agent turns hang for ~125 seconds before timing out and falling back to the fallback model. Google Cloud Console shows the corresponding streamGenerateContent API requests completing successfully with average latency well within the timeout, suggesting OpenClaw's stream handler is failing to correctly consume responses when an extended thinking budget produces a long pre-token thinking phase.

Steps to reproduce

  1. Set agents.defaults.model.primary to google/gemini-2.5-flash.
  2. Leave thinkingDefault unset (default resolves to medium per the gateway log) or explicitly set to medium / low / high.
  3. Configure a Telegram channel and start a normal conversation with the bot.
  4. Within a handful of moderately complex turns, OpenClaw will hang for ~125 seconds, time out, and fall over to the configured fallback (or fail outright if no fallback is configured).

Expected behavior

OpenClaw consumes the streaming response from Gemini 2.5 Flash and returns the answer within a reasonable time (Google reports p99 of ~49s for streamGenerateContent even under thinking-heavy loads, well within OpenClaw's apparent 125s timeout).

Actual behavior

OpenClaw consistently times out at approximately 125 seconds and emits a FailoverError: LLM request timed out even though, on Google's side, the same request period shows zero errors.

Multiple captured timeout durations (durationMs) from the same install:

125567
125592
126284
129212
133544

Sample log block from ~/.openclaw/logs/gateway.err.log (sanitized):

[agent/embedded] Profile google:default timed out. Trying next account...
[agent/embedded] embedded run failover decision: stage=assistant decision=fallback_model reason=timeout from=google/gemini-2.5-flash
[diagnostic] lane task error: lane=main durationMs=125567 error="FailoverError: LLM request timed out."
[model-fallback/decision] model fallback decision: decision=candidate_failed requested=google/gemini-2.5-flash candidate=google/gemini-2.5-flash reason=timeout next=anthropic/claude-sonnet-4-6 detail=LLM request timed out.

Google Cloud Console evidence (same time window)

Pulled from "Generative Language API" details page, filtered to the credential issuing the OpenClaw traffic:

MethodRequestsErrorsAvg latencyp99 latency
v1beta.GenerativeService.StreamGenerateContent530%6.318s49.325s
v1beta.GenerativeService.GenerateContent825% (404, unrelated to this bug — Imagen-on-generateContent mismatch)6.861s1m
v1beta.GenerativeService.BatchEmbedContents20%0.167s0.261s
v1beta.ModelService.ListModels10%0.047s0.065s

Key observations:

  • Zero errors on streamGenerateContent during the OpenClaw timeout window.
  • No 429s anywhere — this isn't a rate-limit problem.
  • p99 latency 49.3s — well inside OpenClaw's 125s timeout. Google is clearly delivering responses; OpenClaw is hanging client-side.

OpenClaw version

2026.5.7 (eeef486)

Operating system

macOS 26.5

Install method

npm global

Model

google/gemini-2.5-flash

Provider / routing chain

openclaw -> google-ai-studio (generativelanguage.googleapis.com/v1beta)

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

Affected: Telegram channel users on this OpenClaw 2026.5.7 install Severity: High (blocks turn completion; ~2-minute silent hang before failure surfaces, so users wait the full timeout before knowing the bot won't respond) Frequency: Intermittent but reliable; 5+ distinct ~125s timeouts captured in gateway.err.log during normal multi-day usage (durationMs values 125567/125592/126284/129212/133544); reproduces within a handful of moderately complex queries when thinkingDefault resolves to medium; eliminated entirely with thinkingDefault: minimal or off Consequence: User-visible turn failures, forced fallback to higher-cost Anthropic model when fallback works, total failure when fallback is also unavailable; wasted thinking-budget tokens on requests that never produce user output

Additional information

Confirmed workaround

Reducing the thinking level eliminates the hang entirely:

  • /think:off (per-session directive) — no hangs across multiple conversations.
  • thinkingDefault: "minimal" (global default) — no hangs across an extended testing period.
  • thinkingDefault: "medium" — reliably reproduces the hang within a few queries.

So the bug appears to require a non-minimal thinking budget on gemini-2.5-flash to manifest.

Hypothesis

The behavior matches the symptoms described in:

  • #8724 — Gemini Flash stuck in thinking/planning loops without producing output
  • #46049 — LLM request timeout settings not respected
  • #52231 — zombie handle blocking heartbeat after embedded run timeout

Most likely cause: when Gemini 2.5 Flash is given a thinking_config with a non-minimal budget, it can spend tens of seconds emitting reasoning deltas before any user-facing tokens. OpenClaw's stream consumer may be misinterpreting the long pre-token phase as a stalled connection, or not correctly handling the thinking-delta wire format, leading to the 125s timeout despite Google having delivered a complete successful response.

Suggested next steps

  • Add per-model generation timeout config (already requested in #8724) so slow-thinking models can extend their effective timeout.
  • Investigate stream-handling behavior specifically for Gemini responses with non-minimal thinking_config budgets.
  • Document safe thinkingDefault values per model family in the Google provider docs.
  • Consider defaulting gemini-2.5-flash to thinkingDefault: minimal until the underlying stream handling is fixed, since medium is currently a footgun.

Additional context

Direct curl to the same model and key, outside OpenClaw, returns gemini-2.5-flash:generateContent responses in under 1 second. The API key and account are healthy; the issue is reproducible only through OpenClaw's embedded runtime when thinking is engaged.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

OpenClaw consumes the streaming response from Gemini 2.5 Flash and returns the answer within a reasonable time (Google reports p99 of ~49s for streamGenerateContent even under thinking-heavy loads, well within OpenClaw's apparent 125s timeout).

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: Gemini 2.5 Flash hangs ~125s with `thinking=medium` despite Google reporting fast, successful streaming responses [1 pull requests, 1 comments, 2 participants]