hermes - 💡(How to fix) Fix Hardcoded platform timeouts break local model workflows — need unified timeout configuration

StepCodex · 2026-05-07T21:39:00Z

[hermes] Problem Users running local models Ollama, llama.cpp, vLLM frequently hit timeout errors at multiple levels of the Hermes stack. While agent-level tim… **Problem** Users running local models (Ollama, llama.cpp, vLLM) frequently hit timeout errors at multiple levels of the Hermes stack. While agent-level timeouts are well-configured (auto-detection of local endpoints, env var overrides), platform adapter timeouts are hardcoded and cannot be tuned without code changes. **What's configurable (good):** - `HERMES_API_TIMEOUT` (default 30min) — main LLM call - `HERMES_STREAM_READ_TIMEOUT` (default 120s, auto-raised for local endpoints) - `HERMES_STREAM_STALE_TIMEOUT` (auto-disabled for local endpoints) - `HERMES_AGENT_TIMEOUT` (inactivity-based, not wall-clock) - `terminal.timeout`, `browser.command_timeout`, `auxiliary.*.timeout` **What's hardcoded (breaks local model workflows):** | Component | Timeout | File | Impact | |-----------|---------|------|--------| | Mattermost HTTP | 30s GET / 60s POST | `gateway/platforms/mattermost.py:117,136` | Message send fails if model response takes >30s to format | | Signal HTTP | 30s | `gateway/platforms/signal.py:197` | Same | | WhatsApp polling | 2-30s | `gateway/platforms/whatsapp.py:42-96` | Connection drops during long inference | | Telegram DoH | 4s | `gateway/platforms/telegram_network.py:27` | Minor but compounds | | Discord queues | 120-300s | `gateway/platforms/discord.py:2560-2739` | Voice/media timeout during slow generation | | User approval | 300s | `gateway/run.py:6212` | Hard 5min wall-clock for approval decisions | | Retry backoff | base=5s, max=120s | `agent/retry_utils.py:19-57` | OOM recovery on local models needs longer backoff | **Use case** Running sparse Qwen 3 models locally — individual responses can take 2-5 minutes. The agent-level timeout is fine (30min), but platform adapters time out trying to send the response once it's ready. The retry backoff is also too aggressive for OOM recovery — local models crash and need 30-60s to restart, but Hermes retries after 5s and hits the same OOM. **Proposed fix** Expose hardcoded platform timeouts as env vars following the existing pattern: ```bash # Per-platform HTTP timeouts (new) HERMES_MATTERMOST_HTTP_TIMEOUT=120 HERMES_SIGNAL_HTTP_TIMEOUT=120 HERMES_WHATSAPP_HTTP_TIMEOUT=60 HERMES_DISCORD_QUEUE_TIMEOUT=600 # Approval timeout (new) HERMES_APPROVAL_TIMEOUT=600 # Retry backoff tuning (new) HERMES_RETRY_BASE_DELAY=10 HERMES_RETRY_MAX_DELAY=300 ``` This follows the established `HERMES_TELEGRAM_HTTP_*` pattern (Telegram already has configurable HTTP timeouts via env vars). **Related issues** - #5450 (per-request wall-clock timeout budget) - #12619 (command-type-aware terminal timeouts) - #16723 (terminal.timeout config mismatch) **Environment** - Hermes version: 0.8.0 - Models: Ollama (Qwen 3 sparse, qwen3.5:9b), local vLLM - Platforms: Telegram, Mattermost, Discord - OS: macOS + Linux (Docker)

hermes2026-05-07 21:39:00

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Code Example

# Per-platform HTTP timeouts (new)
HERMES_MATTERMOST_HTTP_TIMEOUT=120
HERMES_SIGNAL_HTTP_TIMEOUT=120
HERMES_WHATSAPP_HTTP_TIMEOUT=60
HERMES_DISCORD_QUEUE_TIMEOUT=600

# Approval timeout (new)
HERMES_APPROVAL_TIMEOUT=600

# Retry backoff tuning (new)
HERMES_RETRY_BASE_DELAY=10
HERMES_RETRY_MAX_DELAY=300

RAW_BUFFERClick to expand / collapse

Problem

Users running local models (Ollama, llama.cpp, vLLM) frequently hit timeout errors at multiple levels of the Hermes stack. While agent-level timeouts are well-configured (auto-detection of local endpoints, env var overrides), platform adapter timeouts are hardcoded and cannot be tuned without code changes.

What's configurable (good):

HERMES_API_TIMEOUT (default 30min) — main LLM call
HERMES_STREAM_READ_TIMEOUT (default 120s, auto-raised for local endpoints)
HERMES_STREAM_STALE_TIMEOUT (auto-disabled for local endpoints)
HERMES_AGENT_TIMEOUT (inactivity-based, not wall-clock)
terminal.timeout, browser.command_timeout, auxiliary.*.timeout

What's hardcoded (breaks local model workflows):

Component	Timeout	File	Impact
Mattermost HTTP	30s GET / 60s POST	`gateway/platforms/mattermost.py:117,136`	Message send fails if model response takes >30s to format
Signal HTTP	30s	`gateway/platforms/signal.py:197`	Same
WhatsApp polling	2-30s	`gateway/platforms/whatsapp.py:42-96`	Connection drops during long inference
Telegram DoH	4s	`gateway/platforms/telegram_network.py:27`	Minor but compounds
Discord queues	120-300s	`gateway/platforms/discord.py:2560-2739`	Voice/media timeout during slow generation
User approval	300s	`gateway/run.py:6212`	Hard 5min wall-clock for approval decisions
Retry backoff	base=5s, max=120s	`agent/retry_utils.py:19-57`	OOM recovery on local models needs longer backoff

Use case

Running sparse Qwen 3 models locally — individual responses can take 2-5 minutes. The agent-level timeout is fine (30min), but platform adapters time out trying to send the response once it's ready. The retry backoff is also too aggressive for OOM recovery — local models crash and need 30-60s to restart, but Hermes retries after 5s and hits the same OOM.

Proposed fix

Expose hardcoded platform timeouts as env vars following the existing pattern:

# Per-platform HTTP timeouts (new)
HERMES_MATTERMOST_HTTP_TIMEOUT=120
HERMES_SIGNAL_HTTP_TIMEOUT=120
HERMES_WHATSAPP_HTTP_TIMEOUT=60
HERMES_DISCORD_QUEUE_TIMEOUT=600

# Approval timeout (new)
HERMES_APPROVAL_TIMEOUT=600

# Retry backoff tuning (new)
HERMES_RETRY_BASE_DELAY=10
HERMES_RETRY_MAX_DELAY=300

This follows the established HERMES_TELEGRAM_HTTP_* pattern (Telegram already has configurable HTTP timeouts via env vars).

Related issues

#5450 (per-request wall-clock timeout budget)
#12619 (command-type-aware terminal timeouts)
#16723 (terminal.timeout config mismatch)

Environment

Hermes version: 0.8.0
Models: Ollama (Qwen 3 sparse, qwen3.5:9b), local vLLM
Platforms: Telegram, Mattermost, Discord
OS: macOS + Linux (Docker)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #GPU setup #container setup #orchestration issue #cache issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Hardcoded platform timeouts break local model workflows — need unified timeout configuration

Recommended Tools

GitHub issue graph ai analysis

Code Example

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Hardcoded platform timeouts break local model workflows — need unified timeout configuration

Recommended Tools

GitHub issue graph ai analysis

Code Example

Still need to ship something?

RELATED_DISCOVERY

TRENDING