openclaw - ✅(Solved) Fix [Bug]: Gemini 3.1 Pro Preview hangs in agent runtime; direct API works in seconds, openclaw idle-times-out at full timeoutSeconds [1 pull requests, 1 comments, 2 participants]

marcoschierhorn · 2026-05-02T12:24:26Z

[openclaw] google/gemini-3.1-pro-preview always hits LLM idle timeout: no response from model in agent runs, regardless of how high models.providers.google.tim… google/gemini-3.1-pro-preview always hits LLM idle timeout: no response from model in agent runs, regardless of how high models.providers.google.timeoutSeconds is set. The exact same model + prompt size returns successfully in 2-3 seconds via a plain curl to generativelanguage.googleapis.com. Symptoms match #64710 but specifically for Gemini 3.1 Pro with thinking enabled. Environment - OpenClaw: 2026.4.29 (a448042) - Node: 22.22.2 - OS: Linux (GCP, europe-west3) - Provider: google via direct GEMINI_API_KEY env (no proxy, no Vertex) - Model: gemini-3.1-pro-preview (inputTokenLimit: 1048576, supports generateContent/streaming) - Channel: telegram - Workload: agent main session, ~33K tokens total prompt (20K systemPrompt with 134 skills + 28 tools / 12K of tool defs + 5 messages history) Actual Suspected cause provider-stream-shared ships retainThoughtSignature and resolveGoogleGemini3ThinkingLevel, but the iterator wrapper in selection.streamWithIdleTimeout only resets the idle timer when the underlying iterator yields. Gemini 3 Pro emits thoughtSignature chunks during the thinking phase (we observed this in direct API streaming tests), and on Marco's actual prompt the time-to-first-visible-token genuinely exceeds the wall-clock timeout even though the model is producing thinking-frames continuously. If the streaming layer treats thoughtSignature-only chunks as "no token activity" the idle timer is never reset, and the watchdog kills the stream that would have completed. Other paths checked and ruled out: - Override pin verified (modelOverride=gemini-3.1-pro-preview, modelOverrideSource=user, providerOverride=google in sessions.json). - thinkingLevel correctly resolved to high (HIGH for Gemini 3 Pro), not adaptive. So unbounded-thinking is NOT the root cause. - Provider has fresh OAuth via Claude CLI for anthropic fallback — but failover never triggers because the run is "in flight" until the watchdog fires. - agents.defaults.timeoutSeconds is unset → falls through to DEFAULT_LLM_IDLE_TIMEOUT_MS = 120s which is implicit-clamped. Only models.providers.google.timeoutSeconds * 1000 reaches params.model.requestTimeoutMs and bypasses the clamp. Workarounds - /model gemini-flash (google/gemini-flash-latest) — works fine in same setup, 1-2s responses. - /model sonnet (anthropic/claude-sonnet-4-6 via Claude-CLI OAuth) — works fine. - /model deepseek-pro (ollama/deepseek-v4-pro:cloud) — works fine. Suggested fix direction streamWithIdleTimeout should reset the idle timer on any chunk arrival, including thoughtSignature / thinking part frames, not just on user-visible text. Alternatively, add a compat.thinkingFormat value for Google Gemini 3 Pro so the existing thinking-format-aware paths can mark thought frames as activity. Related - #64710 — same pattern (direct API works, agent times out) reported for Ollama. Suggests a shared root cause in the idle-timeout wrapper. - #64854 — feature request that introduced models.providers. .timeoutSeconds. Knob works but doesn't help because activity detection is the actual bug. # PR #76080: fix #76071: handle thoughtSignature-only parts to prevent Gemini stream hang - Repository: openclaw/openclaw - Author: zhangguiping-xydt - State: closed | merged: True - Link: https://github.com/openclaw/openclaw/pull/76080 ## Description (problem / solution / changelog) ## Summary Fixes #76071 — Gemini 3.1 Pro Preview hangs in agent runtime ### Issue [Bug]: Gemini 3.1 Pro Preview hangs in agent runtime; direct API works in seconds, openclaw idle-times-out at full timeoutSeconds ### Changes - `fix(google): handle thoughtSignature-only parts to prevent Gemini stream hang` ### Changed Files ``` extensions/google/transport-stream.test.ts | 100 +++++++++++++++++++++++++++++ extensions/google/transport-stream.ts | 39 +++++++++++ ``` ### Root Cause Gemini 3.1 Pro Preview may emit parts with only `thoughtSignature` and no text content, causing the transport stream to stall at `idle-timeout`. ### Fix Emit a `thinking_signature` event for `thoughtSignature`-only parts to keep the stream active, and start a thinking block when these parts arrive before any text. ## Changed files - `CHANGELOG.md` (modified, +1/-0) - `extensions/google/transport-stream.test.ts` (modified, +116/-0) - `extensions/google/transport-stream.ts` (modified, +8/-4) ## Fix / Workaround Workarounds ### Bug type Regression (worked before, now fails) ### Beta release blocker No ### Summary google/gemini-3.1-pro-preview always hits LLM idle timeout: no response from model in agent runs, regardless of how high models.providers.google.timeoutSeconds is set. The exact same model + prompt size returns successfully in 2-3 seconds via a plain curl to generativelanguage.googleapis.com. Symptoms match #64710 but specifically for Gemini 3.1 Pro with thinking enabled. Environment - O

openclaw2026-05-02 12:24:26

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#76071•Fetched 2026-05-03 04:42:41

View on GitHub

Comments

Participants

Timeline

Reactions

Author

marcoschierhorn

Participants

clawsweeper[bot]

marcoschierhorn

Timeline (top)

referenced ×3closed ×1commented ×1cross-referenced ×1

google/gemini-3.1-pro-preview always hits LLM idle timeout: no response from model in agent runs, regardless of how high
models.providers.google.timeoutSeconds is set. The exact same model + prompt size returns successfully in 2-3 seconds via a plain curl to generativelanguage.googleapis.com. Symptoms match #64710 but specifically for Gemini 3.1 Pro with thinking enabled.

Environment

OpenClaw: 2026.4.29 (a448042)
Node: 22.22.2
OS: Linux (GCP, europe-west3)
Provider: google via direct GEMINI_API_KEY env (no proxy, no Vertex)
Model: gemini-3.1-pro-preview (inputTokenLimit: 1048576, supports generateContent/streaming)
Channel: telegram
Workload: agent main session, ~33K tokens total prompt (20K systemPrompt with 134 skills + 28 tools / 12K of tool defs + 5 messages history)

Actual

Suspected cause

provider-stream-shared ships retainThoughtSignature and resolveGoogleGemini3ThinkingLevel, but the iterator wrapper in selection.streamWithIdleTimeout
only resets the idle timer when the underlying iterator yields. Gemini 3 Pro emits thoughtSignature chunks during the thinking phase (we observed this in direct API streaming tests), and on Marco's actual prompt the time-to-first-visible-token genuinely exceeds the wall-clock timeout even though the model
is producing thinking-frames continuously.

If the streaming layer treats thoughtSignature-only chunks as "no token activity" the idle timer is never reset, and the watchdog kills the stream that
would have completed.

Other paths checked and ruled out:

Override pin verified (modelOverride=gemini-3.1-pro-preview, modelOverrideSource=user, providerOverride=google in sessions.json).
thinkingLevel correctly resolved to high (HIGH for Gemini 3 Pro), not adaptive. So unbounded-thinking is NOT the root cause.
Provider has fresh OAuth via Claude CLI for anthropic fallback — but failover never triggers because the run is "in flight" until the watchdog fires.
agents.defaults.timeoutSeconds is unset → falls through to DEFAULT_LLM_IDLE_TIMEOUT_MS = 120s which is implicit-clamped. Only
models.providers.google.timeoutSeconds * 1000 reaches params.model.requestTimeoutMs and bypasses the clamp.

Workarounds

/model gemini-flash (google/gemini-flash-latest) — works fine in same setup, 1-2s responses.
/model sonnet (anthropic/claude-sonnet-4-6 via Claude-CLI OAuth) — works fine.
/model deepseek-pro (ollama/deepseek-v4-pro:cloud) — works fine.

Suggested fix direction

streamWithIdleTimeout should reset the idle timer on any chunk arrival, including thoughtSignature / thinking part frames, not just on user-visible text. Alternatively, add a compat.thinkingFormat value for Google Gemini 3 Pro so the existing thinking-format-aware paths can mark thought frames as activity.

#64710 — same pattern (direct API works, agent times out) reported for Ollama. Suggests a shared root cause in the idle-timeout wrapper.
#64854 — feature request that introduced models.providers.<id>.timeoutSeconds. Knob works but doesn't help because activity detection is the actual bug.

Root Cause

Other paths checked and ruled out:

Override pin verified (modelOverride=gemini-3.1-pro-preview, modelOverrideSource=user, providerOverride=google in sessions.json).
thinkingLevel correctly resolved to high (HIGH for Gemini 3 Pro), not adaptive. So unbounded-thinking is NOT the root cause.
Provider has fresh OAuth via Claude CLI for anthropic fallback — but failover never triggers because the run is "in flight" until the watchdog fires.
agents.defaults.timeoutSeconds is unset → falls through to DEFAULT_LLM_IDLE_TIMEOUT_MS = 120s which is implicit-clamped. Only
models.providers.google.timeoutSeconds * 1000 reaches params.model.requestTimeoutMs and bypasses the clamp.

Fix Action

Fix / Workaround

Workarounds

PR fix notes

PR #76080: fix #76071: handle thoughtSignature-only parts to prevent Gemini stream hang

Repository: openclaw/openclaw
Author: zhangguiping-xydt
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/76080

Description (problem / solution / changelog)

Summary

Fixes #76071 — Gemini 3.1 Pro Preview hangs in agent runtime

Issue

[Bug]: Gemini 3.1 Pro Preview hangs in agent runtime; direct API works in seconds, openclaw idle-times-out at full timeoutSeconds

Changes

fix(google): handle thoughtSignature-only parts to prevent Gemini stream hang

Changed Files

extensions/google/transport-stream.test.ts | 100 +++++++++++++++++++++++++++++
extensions/google/transport-stream.ts      |  39 +++++++++++

Root Cause

Gemini 3.1 Pro Preview may emit parts with only thoughtSignature and no text content, causing the transport stream to stall at idle-timeout.

Fix

Emit a thinking_signature event for thoughtSignature-only parts to keep the stream active, and start a thinking block when these parts arrive before any text.

Changed files

CHANGELOG.md (modified, +1/-0)
extensions/google/transport-stream.test.ts (modified, +116/-0)
extensions/google/transport-stream.ts (modified, +8/-4)

RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

Summary

Environment

OpenClaw: 2026.4.29 (a448042)
Node: 22.22.2
OS: Linux (GCP, europe-west3)
Provider: google via direct GEMINI_API_KEY env (no proxy, no Vertex)
Model: gemini-3.1-pro-preview (inputTokenLimit: 1048576, supports generateContent/streaming)
Channel: telegram
Workload: agent main session, ~33K tokens total prompt (20K systemPrompt with 134 skills + 28 tools / 12K of tool defs + 5 messages history)

Actual

Suspected cause

If the streaming layer treats thoughtSignature-only chunks as "no token activity" the idle timer is never reset, and the watchdog kills the stream that
would have completed.

Other paths checked and ruled out:

Override pin verified (modelOverride=gemini-3.1-pro-preview, modelOverrideSource=user, providerOverride=google in sessions.json).
thinkingLevel correctly resolved to high (HIGH for Gemini 3 Pro), not adaptive. So unbounded-thinking is NOT the root cause.
Provider has fresh OAuth via Claude CLI for anthropic fallback — but failover never triggers because the run is "in flight" until the watchdog fires.
agents.defaults.timeoutSeconds is unset → falls through to DEFAULT_LLM_IDLE_TIMEOUT_MS = 120s which is implicit-clamped. Only
models.providers.google.timeoutSeconds * 1000 reaches params.model.requestTimeoutMs and bypasses the clamp.

Workarounds

/model gemini-flash (google/gemini-flash-latest) — works fine in same setup, 1-2s responses.
/model sonnet (anthropic/claude-sonnet-4-6 via Claude-CLI OAuth) — works fine.
/model deepseek-pro (ollama/deepseek-v4-pro:cloud) — works fine.

Suggested fix direction

#64710 — same pattern (direct API works, agent times out) reported for Ollama. Suggests a shared root cause in the idle-timeout wrapper.
#64854 — feature request that introduced models.providers.<id>.timeoutSeconds. Knob works but doesn't help because activity detection is the actual bug.

Steps to reproduce

Configure provider explicitly so the per-provider timeout escapes the 120s DEFAULT_LLM_IDLE_TIMEOUT_MS clamp:

{
models: {
providers: { google: { baseUrl: "https://generativelanguage.googleapis.com", api: "google-generative-ai", auth: "api-key",
timeoutSeconds: 600, models: [{ id: "gemini-3.1-pro-preview", contextWindow: 1048576, maxTokens: 65536, input: ["text"], reasoning: true, cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 } }]
}
}
},
agents: { defaults: { thinkingDefault: "high", // maps to Gemini 3 Pro HIGH (bounded thinking) model: { primary: "ollama/deepseek-v4-pro:cloud", fallbacks: ["ollama/glm-5.1:cloud", "anthropic/claude-sonnet-4-6"], timeoutMs: 600000 }
}
}
}

Pin the active session to Gemini 3.1 Pro: /model gemini (or set agent:main:main.modelOverride = "gemini-3.1-pro-preview" in sessions.json).
From a real agent session with multi-turn history + non-trivial tool catalogue, send any user message that requires the model to reason and call tools (e.g. an email-triage instruction).

Expected behavior

Stream completes within seconds. The same model + a comparably-sized prompt (~17K tokens) responds in <3s via plain curl with
thinkingConfig.thinkingLevel: HIGH.

Actual behavior

[trace:embedded-run] emits prep stages → context.compiled → prompt.submitted
No further events for the full configured timeout
Final model.completed with promptError: "LLM idle timeout (600s): no response from model", idleTimedOut: true
model-fallback/decision records requested=google/gemini-3.1-pro-preview candidate=google/gemini-3.1-pro-preview reason=timeout next=none
No outbound TCP connection to generativelanguage.googleapis.com is ever observed for this run via ss -tnp on the gateway PID
File-log shows prompt.submitted then nothing model-related until the idleTimeoutMs fires

The duration stays exactly at the configured timeout (verified at 120s default, 600s after models.providers.google.timeoutSeconds: 600), confirming the
watchdog is doing its job — but no token ever arrives.

Direct-API control test (same key, same machine)

Plain curl with the same prompt size (≈17K tokens, including thinkingConfig.thinkingLevel: HIGH):

status=200 time=2.65s
prompt=17209 thinking=46 resp=… total=17255

Plain curl to gemini-flash-latest with the same prompt: status=200 time=1.28s, resp="OK.".

So the model is healthy from this network and key. Issue is in the agent path.

OpenClaw version

2026.4.29 (a448042)

Operating system

Linux (GCP, europe-west3)

Install method

No response

Model

gemini-3.1-pro-preview (inputTokenLimit: 1048576, supports generateContent/streaming)

Provider / routing chain

openclaw -> gemini

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

extent analysis

TL;DR

The idle timer in the streamWithIdleTimeout function should be reset on any chunk arrival, including thoughtSignature/thinking part frames, to prevent the watchdog from killing the stream.

Guidance

The issue is likely caused by the streamWithIdleTimeout function not resetting the idle timer when thoughtSignature chunks are received, leading to the watchdog killing the stream.
To verify, check the streamWithIdleTimeout function to see if it only resets the idle timer on user-visible text chunks.
A possible workaround is to modify the streamWithIdleTimeout function to reset the idle timer on any chunk arrival, including thoughtSignature/thinking part frames.
Another possible solution is to add a compat.thinkingFormat value for Google Gemini 3 Pro to mark thought frames as activity.

Example

No code example is provided as the issue does not contain enough information to create a specific code snippet.

Notes

The issue seems to be specific to the Google Gemini 3 Pro model and the streamWithIdleTimeout function. The provided workarounds, such as using the gemini-flash model, may not be suitable for all use cases.

Recommendation

Apply a workaround by modifying the streamWithIdleTimeout function to reset the idle timer on any chunk arrival, including thoughtSignature/thinking part frames, as this is the most likely cause of the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Stream completes within seconds. The same model + a comparably-sized prompt (~17K tokens) responds in <3s via plain curl with
thinkingConfig.thinkingLevel: HIGH.

#api #GPU setup #container setup #orchestration issue #cache issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: Gemini 3.1 Pro Preview hangs in agent runtime; direct API works in seconds, openclaw idle-times-out at full timeoutSeconds [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #76080: fix #76071: handle thoughtSignature-only parts to prevent Gemini stream hang

Description (problem / solution / changelog)

Summary

Issue

Changes

Changed Files

Root Cause

Fix

Changed files

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING