hermes - 💡(How to fix) Fix [Bug]: Nous inference API streaming times out on agent-sized payloads — non-streaming works fine

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Nous inference API (inference-api.nousresearch.com) streaming times out consistently for agent-sized payloads. Non-streaming works perfectly. The issue makes the Nous provider unusable for real agent workloads.

Root Cause

Nous inference API (inference-api.nousresearch.com) streaming times out consistently for agent-sized payloads. Non-streaming works perfectly. The issue makes the Nous provider unusable for real agent workloads.

Fix Action

Fix / Workaround

Attempted workarounds (failed)

Current workaround

Code Example

model:
  default: deepseek/deepseek-v4-flash
  provider: nous
  base_url: https://inference-api.nousresearch.com/v1

---

Stream stale for 180s (threshold 180s) — no chunks received.
model=deepseek/deepseek-v4-flash context=~36,327 tokens. Killing connection.
RAW_BUFFERClick to expand / collapse

Platform: Linux (Ubuntu 24.04, ARM64 on MiniPC) Hermes version: 2026.5.16 Python version: 3.14.5 Model affected: deepseek/deepseek-v4-flash and deepseek/deepseek-v4-pro via Nous Portal Messaging platform: Telegram

Summary

Nous inference API (inference-api.nousresearch.com) streaming times out consistently for agent-sized payloads. Non-streaming works perfectly. The issue makes the Nous provider unusable for real agent workloads.

Steps to reproduce

  1. Configure Hermes to use Nous provider:
model:
  default: deepseek/deepseek-v4-flash
  provider: nous
  base_url: https://inference-api.nousresearch.com/v1
  1. Start a conversation with tools enabled — gateway sends payloads with system prompt + tool definitions + conversation context (~30-40K tokens).

  2. Gateway hits APITimeoutError repeatedly (5 retries, exponential backoff, all time out).

Reproducibility evidence

Gateway logs

Stream stale for 180s (threshold 180s) — no chunks received.
model=deepseek/deepseek-v4-flash context=~36,327 tokens. Killing connection.

Direct curl tests

TestFlash (stream)Pro (stream)Non-streaming
Small payload ("say hi", ~50 tokens)✅ Works❌ Hangs on "OPENROUTER PROCESSING"✅ Works
Large payload (18K chars, 50 msgs + 50 tools)❌ Hangs❌ Hangs✅ Works

Also observed

  • Occasional 503 errors: "The requested model is temporarily unavailable due to upstream capacity limits"
  • Issue spans ~2 hours (May 20, 19:50-21:30 MSK) — not a transient blip

Attempted workarounds (failed)

  • display.streaming: false — only controls UI rendering, API still streams
  • providers.nous.stream: false — config key silently ignored by Hermes
  • Fallback to deepseek/deepseek-v4-flash:free via OpenRouter: 429 rate-limited

Current workaround

Switched model to deepseek-v4-pro via DeepSeek API directly (api.deepseek.com) — works perfectly. But this bypasses Nous Portal entirely.

Expected behavior

Nous Portal streaming should handle agent payloads (system prompt + tool definitions + context) without timing out, or Hermes should support stream: false at the provider level as a fallback.

Additional context

Related to #25723 and #21522 — per-provider stream: false would mitigate this when upstream streaming is unreliable. From Russia, the inference API endpoint is reachable (HTTP 200 on model list, non-streaming completions work), so it's not a network/firewall issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Nous Portal streaming should handle agent payloads (system prompt + tool definitions + context) without timing out, or Hermes should support stream: false at the provider level as a fallback.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING