The `vertex-ai` OpenAI-compatible streaming parser should handle reasoning/thinking tokens in the SSE stream the same way the native `google` transport does after PR #76080 — either by: 1. Yielding a keepalive event to refresh the idle watchdog when reasoning tokens are present 2. Or stripping/ignoring reasoning-only chunks without blocking the stream

openclaw - 💡(How to fix) Fix [Bug]: Gemini 2.5 Flash via vertex-ai (OpenAI-compatible) streaming times out — thinking tokens not handled [1 comments, 2 participants]

teknolojay · 2026-05-20T01:17:17Z

[openclaw] Gemini 2.5 Flash via the vertex-ai provider OpenAI-compatible endpoint through a sidecar proxy always hits the LLM idle timeout ~28s , even though t… Gemini 2.5 Flash via the `vertex-ai` provider (OpenAI-compatible endpoint through a sidecar proxy) always hits the LLM idle timeout (~28s), even though the sidecar returns HTTP 200 in ~2-3 seconds. Direct non-streaming curl to the same sidecar works perfectly. **Root cause:** Gemini 2.5 Flash always produces `reasoning_tokens` in its response, even when thinking is not explicitly requested. The OpenAI-compatible streaming SSE parser in OpenClaw cannot handle these thinking/reasoning tokens — it consumes SSE chunks containing only reasoning activity without yielding stream events, so the idle watchdog never resets and kills the connection. **PR #76080 (merged May 2) fixed this for the native `google` transport** by ensuring `thoughtSignature`-only SSE parts refresh the idle watchdog. However, the `vertex-ai` (OpenAI-compatible) transport does not have the equivalent fix. ## Fix / Workaround - #76071 — Same symptom for native Google transport (closed, fixed by #76080) - #76080 — Fix for native transport idle watchdog + thoughtSignature handling - #79595 — google-vertex auth profile detection issue ### Bug type Regression / Missing coverage ### Summary Gemini 2.5 Flash via the `vertex-ai` provider (OpenAI-compatible endpoint through a sidecar proxy) always hits the LLM idle timeout (~28s), even though the sidecar returns HTTP 200 in ~2-3 seconds. Direct non-streaming curl to the same sidecar works perfectly. **Root cause:** Gemini 2.5 Flash always produces `reasoning_tokens` in its response, even when thinking is not explicitly requested. The OpenAI-compatible streaming SSE parser in OpenClaw cannot handle these thinking/reasoning tokens — it consumes SSE chunks containing only reasoning activity without yielding stream events, so the idle watchdog never resets and kills the connection. **PR #76080 (merged May 2) fixed this for the native `google` transport** by ensuring `thoughtSignature`-only SSE parts refresh the idle watchdog. However, the `vertex-ai` (OpenAI-compatible) transport does not have the equivalent fix. ### Reproduction **Setup:** - OpenClaw 2026.5.6 (c97b9f7) - Vertex AI sidecar proxy on `127.0.0.1:8787` (auto-refreshes SA tokens via google-auth-library) - Service account auth (`type: service_account`), project `lkkff-ai`, region `us-central1` - Model: `vertex-ai/google/gemini-2.5-flash` **Non-streaming works (2s):** ```bash curl -s http://127.0.0.1:8787/v1beta1/projects/lkkff-ai/locations/us-central1/endpoints/openapi/chat/completions \ -H "Content-Type: application/json" \ -d '{"model":"google/gemini-2.5-flash","messages":[{"role":"user","content":"Say hi"}],"max_tokens":50,"stream":false}' \ --max-time 15 # Returns: 200, "Hi there!", reasoning_tokens: 23, time: 1.8s ``` **Streaming fails (28s timeout):** Any message through OpenClaw agent → sidecar returns 200 in ~3s → OpenClaw streaming parser hangs → `AbortError: This operation was aborted` → `LLM request timed out` at 28s. **Gateway log:** ``` [agent/embedded] embedded run agent end: isError=true model=google/gemini-2.5-flash provider=vertex-ai error=LLM request timed out. rawError=terminated [model-fallback/decision] model fallback decision: decision=candidate_failed reason=timeout next=none detail=terminated ``` **Sidecar log (healthy):** ``` {"msg":"request","method":"POST","path":"/v1beta1/.../chat/completions","status":200,"ms":2845} {"msg":"stream error","error":"This operation was aborted"} // OpenClaw killed the connection ``` ### What we tried (none worked) | Config change | Result | |---|---| | `thinkingDefault: "off"` in agents.defaults | /status shows Think: off, but model still returns reasoning_tokens | | `reasoning: false` on model catalog entry | No effect — sidecar response still contains reasoning_tokens | | `timeoutSeconds: 60` on provider | No effect — idle timeout fires, not request timeout | | Switching to `gemini-2.0-flash` | 404 on Vertex AI OpenAI-compatible endpoint | | `OPENCLAW_THINKING` env var | Not recognized | ### Why native google-vertex transport is unreachable PR #76080 fixes this for the native `google` transport. However, our credentials are `type: service_account`. OpenClaw's ADC check (`hasGoogleVertexAuthorizedUserAdcSync`) only activates the native transport for `authorized_user` credentials. Service account users are forced through the OpenAI-compatible `vertex-ai` path, which lacks the fix. ### Expected behavior The `vertex-ai` OpenAI-compatible streaming parser should handle reasoning/thinking tokens in the SSE stream the same way the native `google` transport does after PR #76080 — either by: 1. Yielding a keepalive event to refresh the idle watchdog when reasoning tokens are present 2. Or stripping/ignoring reasoning-only chunks without blocking the stream ### Environment - **OpenClaw:** 2026.5.6 (c97b9f7) - **OS:** Ubuntu 24.04 (NVIDI

openclaw2026-05-20 01:17:17

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#84384•Fetched 2026-05-20 03:41:02

View on GitHub

Comments

Participants

Timeline

Reactions

Author

teknolojay

Participants

clawsweeper[bot]

teknolojay

Timeline (top)

labeled ×4commented ×1cross-referenced ×1

Gemini 2.5 Flash via the vertex-ai provider (OpenAI-compatible endpoint through a sidecar proxy) always hits the LLM idle timeout (~28s), even though the sidecar returns HTTP 200 in ~2-3 seconds. Direct non-streaming curl to the same sidecar works perfectly.

Root cause: Gemini 2.5 Flash always produces reasoning_tokens in its response, even when thinking is not explicitly requested. The OpenAI-compatible streaming SSE parser in OpenClaw cannot handle these thinking/reasoning tokens — it consumes SSE chunks containing only reasoning activity without yielding stream events, so the idle watchdog never resets and kills the connection.

PR #76080 (merged May 2) fixed this for the native google transport by ensuring thoughtSignature-only SSE parts refresh the idle watchdog. However, the vertex-ai (OpenAI-compatible) transport does not have the equivalent fix.

Error Message

[agent/embedded] embedded run agent end: isError=true model=google/gemini-2.5-flash provider=vertex-ai error=LLM request timed out. rawError=terminated {"msg":"stream error","error":"This operation was aborted"} // OpenClaw killed the connection

Root Cause

Fix Action

Fix / Workaround

#76071 — Same symptom for native Google transport (closed, fixed by #76080)
#76080 — Fix for native transport idle watchdog + thoughtSignature handling
#79595 — google-vertex auth profile detection issue

Code Example

curl -s http://127.0.0.1:8787/v1beta1/projects/lkkff-ai/locations/us-central1/endpoints/openapi/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"google/gemini-2.5-flash","messages":[{"role":"user","content":"Say hi"}],"max_tokens":50,"stream":false}' \
  --max-time 15
# Returns: 200, "Hi there!", reasoning_tokens: 23, time: 1.8s

---

[agent/embedded] embedded run agent end: isError=true model=google/gemini-2.5-flash provider=vertex-ai error=LLM request timed out. rawError=terminated
[model-fallback/decision] model fallback decision: decision=candidate_failed reason=timeout next=none detail=terminated

---

{"msg":"request","method":"POST","path":"/v1beta1/.../chat/completions","status":200,"ms":2845}
{"msg":"stream error","error":"This operation was aborted"}  // OpenClaw killed the connection

RAW_BUFFERClick to expand / collapse

Bug type

Regression / Missing coverage

Summary

Reproduction

Setup:

OpenClaw 2026.5.6 (c97b9f7)
Vertex AI sidecar proxy on 127.0.0.1:8787 (auto-refreshes SA tokens via google-auth-library)
Service account auth (type: service_account), project lkkff-ai, region us-central1
Model: vertex-ai/google/gemini-2.5-flash

Non-streaming works (2s):

curl -s http://127.0.0.1:8787/v1beta1/projects/lkkff-ai/locations/us-central1/endpoints/openapi/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"google/gemini-2.5-flash","messages":[{"role":"user","content":"Say hi"}],"max_tokens":50,"stream":false}' \
  --max-time 15
# Returns: 200, "Hi there!", reasoning_tokens: 23, time: 1.8s

Streaming fails (28s timeout): Any message through OpenClaw agent → sidecar returns 200 in ~3s → OpenClaw streaming parser hangs → AbortError: This operation was aborted → LLM request timed out at 28s.

Gateway log:

[agent/embedded] embedded run agent end: isError=true model=google/gemini-2.5-flash provider=vertex-ai error=LLM request timed out. rawError=terminated
[model-fallback/decision] model fallback decision: decision=candidate_failed reason=timeout next=none detail=terminated

Sidecar log (healthy):

{"msg":"request","method":"POST","path":"/v1beta1/.../chat/completions","status":200,"ms":2845}
{"msg":"stream error","error":"This operation was aborted"}  // OpenClaw killed the connection

What we tried (none worked)

Config change	Result
`thinkingDefault: "off"` in agents.defaults	/status shows Think: off, but model still returns reasoning_tokens
`reasoning: false` on model catalog entry	No effect — sidecar response still contains reasoning_tokens
`timeoutSeconds: 60` on provider	No effect — idle timeout fires, not request timeout
Switching to `gemini-2.0-flash`	404 on Vertex AI OpenAI-compatible endpoint
`OPENCLAW_THINKING` env var	Not recognized

Why native google-vertex transport is unreachable

PR #76080 fixes this for the native google transport. However, our credentials are type: service_account. OpenClaw's ADC check (hasGoogleVertexAuthorizedUserAdcSync) only activates the native transport for authorized_user credentials. Service account users are forced through the OpenAI-compatible vertex-ai path, which lacks the fix.

Expected behavior

The vertex-ai OpenAI-compatible streaming parser should handle reasoning/thinking tokens in the SSE stream the same way the native google transport does after PR #76080 — either by:

Yielding a keepalive event to refresh the idle watchdog when reasoning tokens are present
Or stripping/ignoring reasoning-only chunks without blocking the stream

Environment

OpenClaw: 2026.5.6 (c97b9f7)
OS: Ubuntu 24.04 (NVIDIA DGX Spark)
Node: 22.22.2
Provider: vertex-ai via sidecar proxy (OpenAI-compatible endpoint)
Model: google/gemini-2.5-flash
Auth: Service account JSON (not ADC authorized_user)
Channel: Telegram

#76071 — Same symptom for native Google transport (closed, fixed by #76080)
#76080 — Fix for native transport idle watchdog + thoughtSignature handling
#79595 — google-vertex auth profile detection issue

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

The vertex-ai OpenAI-compatible streaming parser should handle reasoning/thinking tokens in the SSE stream the same way the native google transport does after PR #76080 — either by:

Yielding a keepalive event to refresh the idle watchdog when reasoning tokens are present
Or stripping/ignoring reasoning-only chunks without blocking the stream

#api #LLM response #prompt template #agent execution #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Bug]: Gemini 2.5 Flash via vertex-ai (OpenAI-compatible) streaming times out — thinking tokens not handled [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug type

Summary

Reproduction

What we tried (none worked)

Why native google-vertex transport is unreachable

Expected behavior

Environment

Related

FAQ

Expected behavior

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Gemini 2.5 Flash via vertex-ai (OpenAI-compatible) streaming times out — thinking tokens not handled [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug type

Summary

Reproduction

What we tried (none worked)

Why native google-vertex transport is unreachable

Expected behavior

Environment

Related

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING