hermes - 💡(How to fix) Fix Custom fallback providers fail silently when they don't support SSE streaming

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Root cause

Fix Action

Fix / Workaround

Workaround

Code Example

fallback_providers:
  - provider: custom
    model: my-model
    base_url: http://my-proxy:8080/v1
RAW_BUFFERClick to expand / collapse

Describe the bug

When a custom fallback provider returns a non-streaming JSON response to a stream=True request, the OpenAI SDK's streaming parser receives zero chunks. This causes:

  • content_parts stays empty → full_content = "".join([]) or None = None
  • Response is flagged as "empty" → retry loop → fallback cascade
  • The provider's valid response is silently discarded

This affects any custom provider that doesn't implement SSE streaming (e.g., lightweight proxies, self-hosted endpoints, Vertex AI REST API).

To reproduce

  1. Configure a custom fallback provider that returns valid JSON but not SSE:
fallback_providers:
  - provider: custom
    model: my-model
    base_url: http://my-proxy:8080/v1
  1. Primary provider hits rate limit → Hermes falls back to custom provider
  2. Custom provider returns valid {"choices": [...]} JSON
  3. Hermes logs: ⚠️ Empty response from model — retrying (1/3)
  4. After 3 retries: cascades to next fallback or gives up

Root cause

run_agent.py line ~8089: _use_streaming = True is unconditional — there's no per-provider or per-fallback streaming toggle. The comment says "Always prefer the streaming path" for health-monitoring benefits, but this assumption breaks custom providers.

When client.chat.completions.create(stream=True) receives a JSON response instead of SSE, the SDK's Stream iterator yields zero chunks. The streaming response builder at line ~5040 produces full_content = None with no tool calls → flagged as invalid.

Expected behavior

Either:

  • (A) Add a per-provider config flag to disable streaming: fallback_providers: [{provider: custom, model: x, base_url: y, stream: false}]
  • (B) Detect non-SSE responses in the streaming path and fall back to non-streaming parsing
  • (C) Document that custom providers MUST support SSE streaming

Workaround

We built a lightweight proxy (~200 lines Python) that translates OpenAI streaming requests to Vertex AI's native streamGenerateContent?alt=sse endpoint and converts the chunks back to OpenAI chat.completion.chunk format. Happy to contribute this as a reference implementation or built-in adapter.

Environment

  • Hermes version: 0.8.0
  • Provider: custom (Vertex AI via proxy)
  • Platform: Docker (gateway mode)
  • OS: macOS (Apple Silicon)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Custom fallback providers fail silently when they don't support SSE streaming