hermes - 💡(How to fix) Fix Gemini 3.5 Flash via OpenAI-compatible proxy: streaming last chunk has choices=0, no finish_reason; intermittent HTTP 500

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Two related issues prevent Gemini 3.5 Flash from working reliably via an OpenAI-compatible proxy (hapuppy, beta.hapuppy.com) in Hermes Agent:

  1. Streaming chunk format deviation — The last streaming chunk uses choices=0 instead of choices[0].finish_reason="stop", causing Hermes to interpret every response as finish_reason="error".

  2. Intermittent HTTP 500 from proxy — Hapuppy returns HTTP 500 on ~30-40% of requests for Gemini 3.5 Flash, even though non-streaming calls to the same endpoint work fine.

  3. No provider-level streaming disable — There is no config option to disable streaming for a specific provider/base_url. The current options (display.streaming for CLI, top-level streaming for gateway REST API) do not affect the agent conversation loop in conversation_loop.py.

Error Message

  1. Streaming chunk format deviation — The last streaming chunk uses choices=0 instead of choices[0].finish_reason="stop", causing Hermes to interpret every response as finish_reason="error". summary=HTTP 500: An error occurred. Reference: req_1779556054668_nzws7wa There is no check for display.streaming (CLI-only), streaming: false (gateway REST API only), or a provider-specific setting. The only way to disable streaming is if the provider returns a "stream not supported" error (which hapuppy does not — it supports streaming, just with a malformed final chunk).

Root Cause

Two related issues prevent Gemini 3.5 Flash from working reliably via an OpenAI-compatible proxy (hapuppy, beta.hapuppy.com) in Hermes Agent:

  1. Streaming chunk format deviation — The last streaming chunk uses choices=0 instead of choices[0].finish_reason="stop", causing Hermes to interpret every response as finish_reason="error".

  2. Intermittent HTTP 500 from proxy — Hapuppy returns HTTP 500 on ~30-40% of requests for Gemini 3.5 Flash, even though non-streaming calls to the same endpoint work fine.

  3. No provider-level streaming disable — There is no config option to disable streaming for a specific provider/base_url. The current options (display.streaming for CLI, top-level streaming for gateway REST API) do not affect the agent conversation loop in conversation_loop.py.

Fix Action

Fix / Workaround

4. (Documentation) Document known streaming quirks

Add a note in the provider integration docs that some OpenAI-compatible proxies return malformed streaming chunks, and how to handle them (currently requires a code patch).

Workaround (applied locally)

Patching conversation_loop.py to force non-streaming for hapuppy:

Code Example

Chunk 0: choices=1 finish_reason=None content='Hello! How can I help you today?'
Chunk 1: choices=1 finish_reason=None content=(no content)
Chunk 2: choices=1 finish_reason=None content=(no content)
Chunk 3: choices=0 finish_reason=N/A content=(no content)   <-- BUG: choices=0, no finish_reason

---

Chunk N: choices=1 finish_reason='stop' content=(no content)

---

resp = client.chat.completions.create(
    model="gemini-3.5-flash",
    messages=[{"role": "user", "content": "say hello"}],
    stream=False
)
# Result: content="Hello! How can I help you today?", finish_reason='stop'

---

2026-05-23 19:07:35 API call failed (attempt 1/1) error_type=InternalServerError
  provider=custom base_url=https://beta.hapuppy.com/v1 model=gemini-3.5-flash
  summary=HTTP 500: An error occurred. Reference: req_1779556054668_nzws7wa

---

_use_streaming = True
# Provider signaled "stream not supported" on a previous
# attempt — switch to non-streaming for the rest of this
# session instead of re-failing every retry.
if getattr(agent, "_disable_streaming", False):
    _use_streaming = False
elif agent.provider == "copilot-acp" or ...
    _use_streaming = False
elif not agent._has_stream_consumers():
    ...

---

agent:
  disable_streaming_providers:
    - hapuppy
    - custom

---

providers:
  hapuppy:
    streaming: false

---

providers:
  hapuppy:
    retry_on_500: true
    retry_delay: 65

---

# Line ~1116:
_use_streaming = True
if agent.base_url and "hapuppy" in agent.base_url.lower():
    _use_streaming = False
RAW_BUFFERClick to expand / collapse

Bug Report: Gemini 3.5 Flash via Hapuppy (OpenAI-compatible proxy) — Streaming + Intermittent 500

Summary

Two related issues prevent Gemini 3.5 Flash from working reliably via an OpenAI-compatible proxy (hapuppy, beta.hapuppy.com) in Hermes Agent:

  1. Streaming chunk format deviation — The last streaming chunk uses choices=0 instead of choices[0].finish_reason="stop", causing Hermes to interpret every response as finish_reason="error".

  2. Intermittent HTTP 500 from proxy — Hapuppy returns HTTP 500 on ~30-40% of requests for Gemini 3.5 Flash, even though non-streaming calls to the same endpoint work fine.

  3. No provider-level streaming disable — There is no config option to disable streaming for a specific provider/base_url. The current options (display.streaming for CLI, top-level streaming for gateway REST API) do not affect the agent conversation loop in conversation_loop.py.

Environment

  • Hermes Agent version: 0.5.0 (latest)
  • Provider: Hapuppy (beta.hapuppy.com/v1) — OpenAI-compatible proxy for Gemini 3.5 Flash
  • Model: gemini-3.5-flash
  • Platform: Telegram + CLI

Evidence

Issue 1: Streaming chunk format

When streaming Gemini 3.5 Flash via hapuppy's OpenAI proxy, the last chunk has choices=0 instead of the expected choices[0].finish_reason="stop":

Chunk 0: choices=1 finish_reason=None content='Hello! How can I help you today?'
Chunk 1: choices=1 finish_reason=None content=(no content)
Chunk 2: choices=1 finish_reason=None content=(no content)
Chunk 3: choices=0 finish_reason=N/A content=(no content)   <-- BUG: choices=0, no finish_reason

Expected (OpenAI standard):

Chunk N: choices=1 finish_reason='stop' content=(no content)

Issue 2: Non-streaming works fine

The same endpoint returns finish_reason="stop" correctly when called with stream=False:

resp = client.chat.completions.create(
    model="gemini-3.5-flash",
    messages=[{"role": "user", "content": "say hello"}],
    stream=False
)
# Result: content="Hello! How can I help you today?", finish_reason='stop'

This is confirmed by the user who can use the same API key + base_url in Chatbox (which uses non-streaming by default) without issues.

Issue 3: Intermittent HTTP 500

Hapuppy returns HTTP 500 randomly (~30% of requests) for Gemini 3.5 Flash:

2026-05-23 19:07:35 API call failed (attempt 1/1) error_type=InternalServerError
  provider=custom base_url=https://beta.hapuppy.com/v1 model=gemini-3.5-flash
  summary=HTTP 500: An error occurred. Reference: req_1779556054668_nzws7wa

When it works (70% of requests), response time is 1.5-4.4s with correct content.

Issue 4: No provider-level streaming config

The agent conversation loop in agent/conversation_loop.py (line 1116) hardcodes _use_streaming = True:

_use_streaming = True
# Provider signaled "stream not supported" on a previous
# attempt — switch to non-streaming for the rest of this
# session instead of re-failing every retry.
if getattr(agent, "_disable_streaming", False):
    _use_streaming = False
elif agent.provider == "copilot-acp" or ...
    _use_streaming = False
elif not agent._has_stream_consumers():
    ...

There is no check for display.streaming (CLI-only), streaming: false (gateway REST API only), or a provider-specific setting. The only way to disable streaming is if the provider returns a "stream not supported" error (which hapuppy does not — it supports streaming, just with a malformed final chunk).

Config options that DON'T work:

  • display.streaming: false → CLI typewriter effect only, not agent API calls
  • streaming: false (top-level) → Only affects gateway REST API server (gateway/platforms/api_server.py)
  • gateway.streaming.enabled: false → Same, only REST API path

Suggested Fixes

1. (Quick Win) Accept choices=0 as stream end

In agent/conversation_loop.py or agent/chat_completion_helpers.py: when the OpenAI SDK yields a chunk with choices=0, treat it as a valid stream termination (equivalent to finish_reason="stop").

2. (Config) Add provider-level streaming disable

Add a config option like:

agent:
  disable_streaming_providers:
    - hapuppy
    - custom

Or per-provider:

providers:
  hapuppy:
    streaming: false

3. (Robustness) Retry on 500 with backoff

The agent's existing retry logic for HTTP 500 (_resp_error_code in {500, 502}) uses the same jittered backoff as other errors. Consider adding provider-specific timeout/retry config:

providers:
  hapuppy:
    retry_on_500: true
    retry_delay: 65

4. (Documentation) Document known streaming quirks

Add a note in the provider integration docs that some OpenAI-compatible proxies return malformed streaming chunks, and how to handle them (currently requires a code patch).

Workaround (applied locally)

Patching conversation_loop.py to force non-streaming for hapuppy:

# Line ~1116:
_use_streaming = True
if agent.base_url and "hapuppy" in agent.base_url.lower():
    _use_streaming = False

This bypasses the streaming chunk parsing entirely for hapuppy, falling back to the non-streaming path which correctly handles finish_reason.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Gemini 3.5 Flash via OpenAI-compatible proxy: streaming last chunk has choices=0, no finish_reason; intermittent HTTP 500