openclaw - 💡(How to fix) Fix [Bug]: Gemini 2.5 Flash via vertex-ai (OpenAI-compatible) streaming times out — thinking tokens not handled [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#84384Fetched 2026-05-20 03:41:02
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
1
Timeline (top)
labeled ×4commented ×1cross-referenced ×1

Gemini 2.5 Flash via the vertex-ai provider (OpenAI-compatible endpoint through a sidecar proxy) always hits the LLM idle timeout (~28s), even though the sidecar returns HTTP 200 in ~2-3 seconds. Direct non-streaming curl to the same sidecar works perfectly.

Root cause: Gemini 2.5 Flash always produces reasoning_tokens in its response, even when thinking is not explicitly requested. The OpenAI-compatible streaming SSE parser in OpenClaw cannot handle these thinking/reasoning tokens — it consumes SSE chunks containing only reasoning activity without yielding stream events, so the idle watchdog never resets and kills the connection.

PR #76080 (merged May 2) fixed this for the native google transport by ensuring thoughtSignature-only SSE parts refresh the idle watchdog. However, the vertex-ai (OpenAI-compatible) transport does not have the equivalent fix.

Error Message

[agent/embedded] embedded run agent end: isError=true model=google/gemini-2.5-flash provider=vertex-ai error=LLM request timed out. rawError=terminated {"msg":"stream error","error":"This operation was aborted"} // OpenClaw killed the connection

Root Cause

Root cause: Gemini 2.5 Flash always produces reasoning_tokens in its response, even when thinking is not explicitly requested. The OpenAI-compatible streaming SSE parser in OpenClaw cannot handle these thinking/reasoning tokens — it consumes SSE chunks containing only reasoning activity without yielding stream events, so the idle watchdog never resets and kills the connection.

Fix Action

Fix / Workaround

  • #76071 — Same symptom for native Google transport (closed, fixed by #76080)
  • #76080 — Fix for native transport idle watchdog + thoughtSignature handling
  • #79595 — google-vertex auth profile detection issue

Code Example

curl -s http://127.0.0.1:8787/v1beta1/projects/lkkff-ai/locations/us-central1/endpoints/openapi/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"google/gemini-2.5-flash","messages":[{"role":"user","content":"Say hi"}],"max_tokens":50,"stream":false}' \
  --max-time 15
# Returns: 200, "Hi there!", reasoning_tokens: 23, time: 1.8s

---

[agent/embedded] embedded run agent end: isError=true model=google/gemini-2.5-flash provider=vertex-ai error=LLM request timed out. rawError=terminated
[model-fallback/decision] model fallback decision: decision=candidate_failed reason=timeout next=none detail=terminated

---

{"msg":"request","method":"POST","path":"/v1beta1/.../chat/completions","status":200,"ms":2845}
{"msg":"stream error","error":"This operation was aborted"}  // OpenClaw killed the connection
RAW_BUFFERClick to expand / collapse

Bug type

Regression / Missing coverage

Summary

Gemini 2.5 Flash via the vertex-ai provider (OpenAI-compatible endpoint through a sidecar proxy) always hits the LLM idle timeout (~28s), even though the sidecar returns HTTP 200 in ~2-3 seconds. Direct non-streaming curl to the same sidecar works perfectly.

Root cause: Gemini 2.5 Flash always produces reasoning_tokens in its response, even when thinking is not explicitly requested. The OpenAI-compatible streaming SSE parser in OpenClaw cannot handle these thinking/reasoning tokens — it consumes SSE chunks containing only reasoning activity without yielding stream events, so the idle watchdog never resets and kills the connection.

PR #76080 (merged May 2) fixed this for the native google transport by ensuring thoughtSignature-only SSE parts refresh the idle watchdog. However, the vertex-ai (OpenAI-compatible) transport does not have the equivalent fix.

Reproduction

Setup:

  • OpenClaw 2026.5.6 (c97b9f7)
  • Vertex AI sidecar proxy on 127.0.0.1:8787 (auto-refreshes SA tokens via google-auth-library)
  • Service account auth (type: service_account), project lkkff-ai, region us-central1
  • Model: vertex-ai/google/gemini-2.5-flash

Non-streaming works (2s):

curl -s http://127.0.0.1:8787/v1beta1/projects/lkkff-ai/locations/us-central1/endpoints/openapi/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"google/gemini-2.5-flash","messages":[{"role":"user","content":"Say hi"}],"max_tokens":50,"stream":false}' \
  --max-time 15
# Returns: 200, "Hi there!", reasoning_tokens: 23, time: 1.8s

Streaming fails (28s timeout): Any message through OpenClaw agent → sidecar returns 200 in ~3s → OpenClaw streaming parser hangs → AbortError: This operation was abortedLLM request timed out at 28s.

Gateway log:

[agent/embedded] embedded run agent end: isError=true model=google/gemini-2.5-flash provider=vertex-ai error=LLM request timed out. rawError=terminated
[model-fallback/decision] model fallback decision: decision=candidate_failed reason=timeout next=none detail=terminated

Sidecar log (healthy):

{"msg":"request","method":"POST","path":"/v1beta1/.../chat/completions","status":200,"ms":2845}
{"msg":"stream error","error":"This operation was aborted"}  // OpenClaw killed the connection

What we tried (none worked)

Config changeResult
thinkingDefault: "off" in agents.defaults/status shows Think: off, but model still returns reasoning_tokens
reasoning: false on model catalog entryNo effect — sidecar response still contains reasoning_tokens
timeoutSeconds: 60 on providerNo effect — idle timeout fires, not request timeout
Switching to gemini-2.0-flash404 on Vertex AI OpenAI-compatible endpoint
OPENCLAW_THINKING env varNot recognized

Why native google-vertex transport is unreachable

PR #76080 fixes this for the native google transport. However, our credentials are type: service_account. OpenClaw's ADC check (hasGoogleVertexAuthorizedUserAdcSync) only activates the native transport for authorized_user credentials. Service account users are forced through the OpenAI-compatible vertex-ai path, which lacks the fix.

Expected behavior

The vertex-ai OpenAI-compatible streaming parser should handle reasoning/thinking tokens in the SSE stream the same way the native google transport does after PR #76080 — either by:

  1. Yielding a keepalive event to refresh the idle watchdog when reasoning tokens are present
  2. Or stripping/ignoring reasoning-only chunks without blocking the stream

Environment

  • OpenClaw: 2026.5.6 (c97b9f7)
  • OS: Ubuntu 24.04 (NVIDIA DGX Spark)
  • Node: 22.22.2
  • Provider: vertex-ai via sidecar proxy (OpenAI-compatible endpoint)
  • Model: google/gemini-2.5-flash
  • Auth: Service account JSON (not ADC authorized_user)
  • Channel: Telegram

Related

  • #76071 — Same symptom for native Google transport (closed, fixed by #76080)
  • #76080 — Fix for native transport idle watchdog + thoughtSignature handling
  • #79595 — google-vertex auth profile detection issue

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The vertex-ai OpenAI-compatible streaming parser should handle reasoning/thinking tokens in the SSE stream the same way the native google transport does after PR #76080 — either by:

  1. Yielding a keepalive event to refresh the idle watchdog when reasoning tokens are present
  2. Or stripping/ignoring reasoning-only chunks without blocking the stream

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING