hermes - 💡(How to fix) Fix Feature Request: Session Recovery on Temporary Provider Outage

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Error: HTTP 404: Cannot POST /v1/chat/completions (provider briefly unavailable, but dashboard is still running).

RAW_BUFFERClick to expand / collapse

Feature Request: Session Recovery on Temporary Provider Outage

Problem or Use Case*

When the LLM provider goes down temporarily (restart / update / network hiccup for 2–3 minutes), the entire Hermes session is destroyed and cannot be recovered — even after the provider comes back online.

In a self-hosted setup (e.g. Manifest on a LAN server), this happens regularly. Hermes performs 3 retries in ~12 seconds, but if the provider takes 2–3 minutes to recover, the session is dead. Over 2 days I've had 17 such API errors, losing multiple sessions with 200–350 messages (~240k tokens each).

Error: HTTP 404: Cannot POST /v1/chat/completions (provider briefly unavailable, but dashboard is still running).

This affects anyone running Hermes with:

  • A self-hosted provider (Manifest, Ollama, LiteLLM, vLLM)
  • An unstable LAN server
  • Any provider with occasional brief outages

Proposed Solution*

Four suggestions that complement each other:

1. Session Pause instead of Kill When all retries are exhausted: put the session in a "paused" state instead of destroying it. The user sees: "Provider unreachable, session paused — try again later."

2. Longer Retry Window Current: 3 retries at ~2s / ~4s / ~5s = ~12s total. Better: Exponential backoff up to 120s (e.g. 5s / 15s / 30s / 60s / 120s). This covers brief provider restarts.

3. Automatic Session Recovery After a successful reconnect: check if the session can be resumed (repair role alternation) and retry the last failed turn.

4. Transparent Provider Failover (optional) When a fallback provider is configured: auto-switch while the primary is down, auto-switch back when it recovers. The session keeps running — the user only notices a slight delay.

Alternatives Considered

  • Manual Session Repair: Currently possible via sqlite3 state.db "UPDATE sessions SET model='...'" — but this only fixes model errors, not broken role alternation
  • Hard reset (/new): Currently the only reliable option — losing all 240k tokens of context
  • Do nothing: 17 errors in 2 days, lost work, frustration

Feature Type*

Scope: Small / Medium — this is a session lifecycle improvement in the agent loop (comp/agent), no new infrastructure needed. The main changes: retry logic in agent/conversation_loop.py, session state handling, and role-alternation repair on reconnect.

Additional Context

Environment:

  • Hermes Agent: v0.14.0 (2026.5.16) — self-hosted in Proxmox LXC
  • Provider: Manifest (self-hosted on Mac mini, http://192.168.1.245:2099/v1)
  • Fallback: Ollama on same Mac mini (http://192.168.1.245:11434)
  • Frontend: Hermes WebUI (browser-based)
  • Session count in DB: ~650 sessions
  • Typical session: 200–350 messages, ~240k tokens

Relevant existing issues:

  • #5694 (clearer recovery UX when model failover is exhausted)
  • #32411 (per-task fallback_providers config for transparent failover)
  • #18452 (ACP adapter does not pass fallback_providers to AIAgent)

Hermes 0.14.0

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING