hermes - 💡(How to fix) Fix [Bug]: Nous inference API streaming times out on agent-sized payloads — non-streaming works fine

hermes2026-05-20 18:40:55

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Nous inference API (inference-api.nousresearch.com) streaming times out consistently for agent-sized payloads. Non-streaming works perfectly. The issue makes the Nous provider unusable for real agent workloads.

Root Cause

Fix Action

Fix / Workaround

Attempted workarounds (failed)

Current workaround

Code Example

model:
  default: deepseek/deepseek-v4-flash
  provider: nous
  base_url: https://inference-api.nousresearch.com/v1

---

Stream stale for 180s (threshold 180s) — no chunks received.
model=deepseek/deepseek-v4-flash context=~36,327 tokens. Killing connection.

RAW_BUFFERClick to expand / collapse

Platform: Linux (Ubuntu 24.04, ARM64 on MiniPC) Hermes version: 2026.5.16 Python version: 3.14.5 Model affected: deepseek/deepseek-v4-flash and deepseek/deepseek-v4-pro via Nous Portal Messaging platform: Telegram

Summary

Steps to reproduce

Configure Hermes to use Nous provider:

model:
  default: deepseek/deepseek-v4-flash
  provider: nous
  base_url: https://inference-api.nousresearch.com/v1

Start a conversation with tools enabled — gateway sends payloads with system prompt + tool definitions + conversation context (~30-40K tokens).
Gateway hits APITimeoutError repeatedly (5 retries, exponential backoff, all time out).

Reproducibility evidence

Gateway logs

Stream stale for 180s (threshold 180s) — no chunks received.
model=deepseek/deepseek-v4-flash context=~36,327 tokens. Killing connection.

Direct curl tests

Test	Flash (stream)	Pro (stream)	Non-streaming
Small payload ("say hi", ~50 tokens)	✅ Works	❌ Hangs on "OPENROUTER PROCESSING"	✅ Works
Large payload (18K chars, 50 msgs + 50 tools)	❌ Hangs	❌ Hangs	✅ Works

Also observed

Occasional 503 errors: "The requested model is temporarily unavailable due to upstream capacity limits"
Issue spans ~2 hours (May 20, 19:50-21:30 MSK) — not a transient blip

Attempted workarounds (failed)

display.streaming: false — only controls UI rendering, API still streams
providers.nous.stream: false — config key silently ignored by Hermes
Fallback to deepseek/deepseek-v4-flash:free via OpenRouter: 429 rate-limited

Current workaround

Switched model to deepseek-v4-pro via DeepSeek API directly (api.deepseek.com) — works perfectly. But this bypasses Nous Portal entirely.

Expected behavior

Nous Portal streaming should handle agent payloads (system prompt + tool definitions + context) without timing out, or Hermes should support stream: false at the provider level as a fallback.

Additional context

Related to #25723 and #21522 — per-provider stream: false would mitigate this when upstream streaming is unreliable. From Russia, the inference API endpoint is reachable (HTTP 200 on model list, non-streaming completions work), so it's not a network/firewall issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Nous Portal streaming should handle agent payloads (system prompt + tool definitions + context) without timing out, or Hermes should support stream: false at the provider level as a fallback.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Bug]: Nous inference API streaming times out on agent-sized payloads — non-streaming works fine

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Attempted workarounds (failed)

Current workaround

Code Example

Summary

Steps to reproduce

Reproducibility evidence

Gateway logs

Direct curl tests

Also observed

Attempted workarounds (failed)

Current workaround

Expected behavior

Additional context

FAQ

Expected behavior

Still need to ship something?

TRENDING