openclaw - ✅(Solved) Fix [Bug]: Gemini 3.1 Pro Preview hangs in agent runtime; direct API works in seconds, openclaw idle-times-out at full timeoutSeconds [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#76071Fetched 2026-05-03 04:42:41
View on GitHub
Comments
1
Participants
2
Timeline
8
Reactions
2
Timeline (top)
referenced ×3closed ×1commented ×1cross-referenced ×1

google/gemini-3.1-pro-preview always hits LLM idle timeout: no response from model in agent runs, regardless of how high
models.providers.google.timeoutSeconds is set. The exact same model + prompt size returns successfully in 2-3 seconds via a plain curl to generativelanguage.googleapis.com. Symptoms match #64710 but specifically for Gemini 3.1 Pro with thinking enabled.

Environment

  • OpenClaw: 2026.4.29 (a448042)
  • Node: 22.22.2
  • OS: Linux (GCP, europe-west3)
  • Provider: google via direct GEMINI_API_KEY env (no proxy, no Vertex)
  • Model: gemini-3.1-pro-preview (inputTokenLimit: 1048576, supports generateContent/streaming)
  • Channel: telegram
  • Workload: agent main session, ~33K tokens total prompt (20K systemPrompt with 134 skills + 28 tools / 12K of tool defs + 5 messages history)

Actual

Suspected cause

provider-stream-shared ships retainThoughtSignature and resolveGoogleGemini3ThinkingLevel, but the iterator wrapper in selection.streamWithIdleTimeout
only resets the idle timer when the underlying iterator yields. Gemini 3 Pro emits thoughtSignature chunks during the thinking phase (we observed this in direct API streaming tests), and on Marco's actual prompt the time-to-first-visible-token genuinely exceeds the wall-clock timeout even though the model
is producing thinking-frames continuously.

If the streaming layer treats thoughtSignature-only chunks as "no token activity" the idle timer is never reset, and the watchdog kills the stream that
would have completed.

Other paths checked and ruled out:

  • Override pin verified (modelOverride=gemini-3.1-pro-preview, modelOverrideSource=user, providerOverride=google in sessions.json).
  • thinkingLevel correctly resolved to high (HIGH for Gemini 3 Pro), not adaptive. So unbounded-thinking is NOT the root cause.
  • Provider has fresh OAuth via Claude CLI for anthropic fallback — but failover never triggers because the run is "in flight" until the watchdog fires.
  • agents.defaults.timeoutSeconds is unset → falls through to DEFAULT_LLM_IDLE_TIMEOUT_MS = 120s which is implicit-clamped. Only
    models.providers.google.timeoutSeconds * 1000 reaches params.model.requestTimeoutMs and bypasses the clamp.

Workarounds

  • /model gemini-flash (google/gemini-flash-latest) — works fine in same setup, 1-2s responses.
  • /model sonnet (anthropic/claude-sonnet-4-6 via Claude-CLI OAuth) — works fine.
  • /model deepseek-pro (ollama/deepseek-v4-pro:cloud) — works fine.

Suggested fix direction

streamWithIdleTimeout should reset the idle timer on any chunk arrival, including thoughtSignature / thinking part frames, not just on user-visible text. Alternatively, add a compat.thinkingFormat value for Google Gemini 3 Pro so the existing thinking-format-aware paths can mark thought frames as activity.

Related

  • #64710 — same pattern (direct API works, agent times out) reported for Ollama. Suggests a shared root cause in the idle-timeout wrapper.
  • #64854 — feature request that introduced models.providers.<id>.timeoutSeconds. Knob works but doesn't help because activity detection is the actual bug.

Root Cause

Other paths checked and ruled out:

  • Override pin verified (modelOverride=gemini-3.1-pro-preview, modelOverrideSource=user, providerOverride=google in sessions.json).
  • thinkingLevel correctly resolved to high (HIGH for Gemini 3 Pro), not adaptive. So unbounded-thinking is NOT the root cause.
  • Provider has fresh OAuth via Claude CLI for anthropic fallback — but failover never triggers because the run is "in flight" until the watchdog fires.
  • agents.defaults.timeoutSeconds is unset → falls through to DEFAULT_LLM_IDLE_TIMEOUT_MS = 120s which is implicit-clamped. Only
    models.providers.google.timeoutSeconds * 1000 reaches params.model.requestTimeoutMs and bypasses the clamp.

Fix Action

Fix / Workaround

Workarounds

PR fix notes

PR #76080: fix #76071: handle thoughtSignature-only parts to prevent Gemini stream hang

Description (problem / solution / changelog)

Summary

Fixes #76071 — Gemini 3.1 Pro Preview hangs in agent runtime

Issue

[Bug]: Gemini 3.1 Pro Preview hangs in agent runtime; direct API works in seconds, openclaw idle-times-out at full timeoutSeconds

Changes

  • fix(google): handle thoughtSignature-only parts to prevent Gemini stream hang

Changed Files

extensions/google/transport-stream.test.ts | 100 +++++++++++++++++++++++++++++
extensions/google/transport-stream.ts      |  39 +++++++++++

Root Cause

Gemini 3.1 Pro Preview may emit parts with only thoughtSignature and no text content, causing the transport stream to stall at idle-timeout.

Fix

Emit a thinking_signature event for thoughtSignature-only parts to keep the stream active, and start a thinking block when these parts arrive before any text.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • extensions/google/transport-stream.test.ts (modified, +116/-0)
  • extensions/google/transport-stream.ts (modified, +8/-4)
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

google/gemini-3.1-pro-preview always hits LLM idle timeout: no response from model in agent runs, regardless of how high
models.providers.google.timeoutSeconds is set. The exact same model + prompt size returns successfully in 2-3 seconds via a plain curl to generativelanguage.googleapis.com. Symptoms match #64710 but specifically for Gemini 3.1 Pro with thinking enabled.

Environment

  • OpenClaw: 2026.4.29 (a448042)
  • Node: 22.22.2
  • OS: Linux (GCP, europe-west3)
  • Provider: google via direct GEMINI_API_KEY env (no proxy, no Vertex)
  • Model: gemini-3.1-pro-preview (inputTokenLimit: 1048576, supports generateContent/streaming)
  • Channel: telegram
  • Workload: agent main session, ~33K tokens total prompt (20K systemPrompt with 134 skills + 28 tools / 12K of tool defs + 5 messages history)

Actual

Suspected cause

provider-stream-shared ships retainThoughtSignature and resolveGoogleGemini3ThinkingLevel, but the iterator wrapper in selection.streamWithIdleTimeout
only resets the idle timer when the underlying iterator yields. Gemini 3 Pro emits thoughtSignature chunks during the thinking phase (we observed this in direct API streaming tests), and on Marco's actual prompt the time-to-first-visible-token genuinely exceeds the wall-clock timeout even though the model
is producing thinking-frames continuously.

If the streaming layer treats thoughtSignature-only chunks as "no token activity" the idle timer is never reset, and the watchdog kills the stream that
would have completed.

Other paths checked and ruled out:

  • Override pin verified (modelOverride=gemini-3.1-pro-preview, modelOverrideSource=user, providerOverride=google in sessions.json).
  • thinkingLevel correctly resolved to high (HIGH for Gemini 3 Pro), not adaptive. So unbounded-thinking is NOT the root cause.
  • Provider has fresh OAuth via Claude CLI for anthropic fallback — but failover never triggers because the run is "in flight" until the watchdog fires.
  • agents.defaults.timeoutSeconds is unset → falls through to DEFAULT_LLM_IDLE_TIMEOUT_MS = 120s which is implicit-clamped. Only
    models.providers.google.timeoutSeconds * 1000 reaches params.model.requestTimeoutMs and bypasses the clamp.

Workarounds

  • /model gemini-flash (google/gemini-flash-latest) — works fine in same setup, 1-2s responses.
  • /model sonnet (anthropic/claude-sonnet-4-6 via Claude-CLI OAuth) — works fine.
  • /model deepseek-pro (ollama/deepseek-v4-pro:cloud) — works fine.

Suggested fix direction

streamWithIdleTimeout should reset the idle timer on any chunk arrival, including thoughtSignature / thinking part frames, not just on user-visible text. Alternatively, add a compat.thinkingFormat value for Google Gemini 3 Pro so the existing thinking-format-aware paths can mark thought frames as activity.

Related

  • #64710 — same pattern (direct API works, agent times out) reported for Ollama. Suggests a shared root cause in the idle-timeout wrapper.
  • #64854 — feature request that introduced models.providers.<id>.timeoutSeconds. Knob works but doesn't help because activity detection is the actual bug.

Steps to reproduce

  1. Configure provider explicitly so the per-provider timeout escapes the 120s DEFAULT_LLM_IDLE_TIMEOUT_MS clamp:

{
models: {
providers: { google: { baseUrl: "https://generativelanguage.googleapis.com", api: "google-generative-ai", auth: "api-key",
timeoutSeconds: 600, models: [{ id: "gemini-3.1-pro-preview", contextWindow: 1048576, maxTokens: 65536, input: ["text"], reasoning: true, cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 } }]
}
}
},
agents: { defaults: { thinkingDefault: "high", // maps to Gemini 3 Pro HIGH (bounded thinking) model: { primary: "ollama/deepseek-v4-pro:cloud", fallbacks: ["ollama/glm-5.1:cloud", "anthropic/claude-sonnet-4-6"], timeoutMs: 600000 }
}
}
}

  1. Pin the active session to Gemini 3.1 Pro: /model gemini (or set agent:main:main.modelOverride = "gemini-3.1-pro-preview" in sessions.json).
  2. From a real agent session with multi-turn history + non-trivial tool catalogue, send any user message that requires the model to reason and call tools (e.g. an email-triage instruction).

Expected behavior

Stream completes within seconds. The same model + a comparably-sized prompt (~17K tokens) responds in <3s via plain curl with
thinkingConfig.thinkingLevel: HIGH.

Actual behavior

  • [trace:embedded-run] emits prep stages → context.compiled → prompt.submitted
  • No further events for the full configured timeout
  • Final model.completed with promptError: "LLM idle timeout (600s): no response from model", idleTimedOut: true
  • model-fallback/decision records requested=google/gemini-3.1-pro-preview candidate=google/gemini-3.1-pro-preview reason=timeout next=none
  • No outbound TCP connection to generativelanguage.googleapis.com is ever observed for this run via ss -tnp on the gateway PID
  • File-log shows prompt.submitted then nothing model-related until the idleTimeoutMs fires

The duration stays exactly at the configured timeout (verified at 120s default, 600s after models.providers.google.timeoutSeconds: 600), confirming the
watchdog is doing its job — but no token ever arrives.

Direct-API control test (same key, same machine)

Plain curl with the same prompt size (≈17K tokens, including thinkingConfig.thinkingLevel: HIGH):

status=200 time=2.65s
prompt=17209 thinking=46 resp=… total=17255

Plain curl to gemini-flash-latest with the same prompt: status=200 time=1.28s, resp="OK.".

So the model is healthy from this network and key. Issue is in the agent path.

OpenClaw version

2026.4.29 (a448042)

Operating system

Linux (GCP, europe-west3)

Install method

No response

Model

gemini-3.1-pro-preview (inputTokenLimit: 1048576, supports generateContent/streaming)

Provider / routing chain

openclaw -> gemini

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

extent analysis

TL;DR

The idle timer in the streamWithIdleTimeout function should be reset on any chunk arrival, including thoughtSignature/thinking part frames, to prevent the watchdog from killing the stream.

Guidance

  • The issue is likely caused by the streamWithIdleTimeout function not resetting the idle timer when thoughtSignature chunks are received, leading to the watchdog killing the stream.
  • To verify, check the streamWithIdleTimeout function to see if it only resets the idle timer on user-visible text chunks.
  • A possible workaround is to modify the streamWithIdleTimeout function to reset the idle timer on any chunk arrival, including thoughtSignature/thinking part frames.
  • Another possible solution is to add a compat.thinkingFormat value for Google Gemini 3 Pro to mark thought frames as activity.

Example

No code example is provided as the issue does not contain enough information to create a specific code snippet.

Notes

The issue seems to be specific to the Google Gemini 3 Pro model and the streamWithIdleTimeout function. The provided workarounds, such as using the gemini-flash model, may not be suitable for all use cases.

Recommendation

Apply a workaround by modifying the streamWithIdleTimeout function to reset the idle timer on any chunk arrival, including thoughtSignature/thinking part frames, as this is the most likely cause of the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Stream completes within seconds. The same model + a comparably-sized prompt (~17K tokens) responds in <3s via plain curl with
thinkingConfig.thinkingLevel: HIGH.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: Gemini 3.1 Pro Preview hangs in agent runtime; direct API works in seconds, openclaw idle-times-out at full timeoutSeconds [1 pull requests, 1 comments, 2 participants]