hermes - 💡(How to fix) Fix DeepSeek V4 Pro via OpenRouter causes gateway crash loop and Telegram bot failure [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#16677Fetched 2026-04-28 06:51:42
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
labeled ×6

Error Message

ERROR: HTTP 429 - deepseek/deepseek-v4-pro is temporarily rate-limited upstream (provider: Io Net) WARNING: resolve_provider_client: openrouter requested but OpenRouter credential pool has no usable entries (credentials may be exhausted) systemd[1]: hermes-gateway.service: Main process exited, code=exited, status=75/TEMPFAIL systemd[1]: hermes-gateway.service: Failed with result 'exit-code'.

Root Cause

  1. No circuit breaker: The gateway treats every model call failure as fatal. There is no circuit breaker pattern that would temporarily stop trying the failing model and switch to a fallback.

  2. Fallback timing: The fallback_providers configuration exists but the fallback is only attempted after the primary model exhausts all retries. If the retries themselves cause the process to crash (as with the 429 rate limit), the fallback is never reached.

  3. Auto provider resolution lacks constraints: The auxiliary.vision.provider: auto setting resolves to any available model without checking if it meets the 64K minimum context window requirement.

  4. Process-level crash on API errors: Model API errors (429, 401) propagate up to the process level instead of being caught and handled at the session/conversation level.

Fix Action

Fix / Workaround

Workaround (Current)

Code Example

ERROR: HTTP 429 - deepseek/deepseek-v4-pro is temporarily rate-limited upstream (provider: Io Net)
WARNING: resolve_provider_client: openrouter requested but OpenRouter credential pool has no usable entries (credentials may be exhausted)
systemd[1]: hermes-gateway.service: Main process exited, code=exited, status=75/TEMPFAIL
systemd[1]: hermes-gateway.service: Failed with result 'exit-code'.

---

ValueError: Model deepseek/deepseek-chat-v3-0324 has a context window of 16,384 tokens, which is below the minimum 64,000 required by Hermes Agent.

---

ERROR: HTTP 401 - User not found (deepseek/deepseek-v4-pro via OpenRouter)

---

# Use a stable fallback model
fallback_providers:
  - provider: openrouter
    model: z-ai/glm-5.1

# Force auxiliary vision to use a high-context model
auxiliary:
  vision:
    provider: openrouter  # NOT "auto"
    model: google/gemini-2.5-flash
RAW_BUFFERClick to expand / collapse

DeepSeek V4 Pro via OpenRouter causes gateway crash loop and Telegram bot failure

Bug Description

When using deepseek/deepseek-v4-pro as the default model via OpenRouter, the Hermes Agent gateway enters a crash loop that renders the Telegram bot (and any other messaging integration) completely unresponsive. This has been a recurring issue over the past 2 days (April 26-27, 2026) with multiple distinct failure modes.

Steps to Reproduce

  1. Configure model.default: deepseek/deepseek-v4-pro with model.provider: openrouter
  2. Connect a Telegram bot via the Hermes gateway
  3. Wait for OpenRouter upstream rate limits or provider outages to trigger
  4. Gateway crashes with status=75/TEMPFAIL, systemd auto-restarts, and the cycle repeats

Failure Modes Observed

Failure Mode 1: HTTP 429 Rate Limit Crash Loop (April 26)

The OpenRouter upstream provider "Io Net" applies aggressive rate limits to deepseek/deepseek-v4-pro. When the rate limit is hit:

  • The gateway receives a Telegram message
  • Attempts to call the model via OpenRouter
  • Receives HTTP 429 ("deepseek/deepseek-v4-pro is temporarily rate-limited upstream")
  • Retries 3 times, all fail
  • Gateway process exits with status=75/TEMPFAIL
  • systemd auto-restarts the gateway
  • The cycle repeats indefinitely, making the bot completely unavailable

Log excerpt:

ERROR: HTTP 429 - deepseek/deepseek-v4-pro is temporarily rate-limited upstream (provider: Io Net)
WARNING: resolve_provider_client: openrouter requested but OpenRouter credential pool has no usable entries (credentials may be exhausted)
systemd[1]: hermes-gateway.service: Main process exited, code=exited, status=75/TEMPFAIL
systemd[1]: hermes-gateway.service: Failed with result 'exit-code'.

Failure Mode 2: Auxiliary Vision Provider Context Window ValueError (April 27)

When auxiliary.vision.provider is set to auto, the auto-detect mechanism resolves to deepseek/deepseek-chat-v3-0324 — a model with only 16,384 tokens of context. Hermes Agent requires a minimum of 64,000 tokens for auxiliary operations, causing a hard crash:

ValueError: Model deepseek/deepseek-chat-v3-0324 has a context window of 16,384 tokens, which is below the minimum 64,000 required by Hermes Agent.

This crash happens before the agent can respond to any Telegram message, making the bot appear completely dead even though the gateway process stays running.

Failure Mode 3: HTTP 401 Authentication Failure (April 27)

On a separate occasion, deepseek/deepseek-v4-pro started returning HTTP 401 errors with message "User not found":

ERROR: HTTP 401 - User not found (deepseek/deepseek-v4-pro via OpenRouter)

This is a non-retryable error that prevents the fallback mechanism from working correctly.

Expected Behavior

  1. Rate limit resilience: When the primary model hits rate limits, the gateway should gracefully fall back to a configured fallback model without crashing the entire process.
  2. Auxiliary model validation: The auto provider resolution should validate that the resolved model meets the minimum context window requirement (64K tokens) before accepting it, and fall back to another model if it doesn't.
  3. Non-retryable error handling: HTTP 401/403 errors should be treated as non-retryable and immediately trigger fallback rather than retrying.
  4. Gateway stability: A model API failure should never crash the gateway process. The gateway should remain running and responsive even when all model calls fail.

Actual Behavior

  • Gateway crashes completely on model failures
  • No graceful degradation — the Telegram bot goes 100% offline
  • Auxiliary vision provider auto mode can select models with insufficient context
  • systemd restart loop consumes resources without resolving the issue
  • Orphaned gateway processes and stale PID files accumulate

Root Cause Analysis

  1. No circuit breaker: The gateway treats every model call failure as fatal. There is no circuit breaker pattern that would temporarily stop trying the failing model and switch to a fallback.

  2. Fallback timing: The fallback_providers configuration exists but the fallback is only attempted after the primary model exhausts all retries. If the retries themselves cause the process to crash (as with the 429 rate limit), the fallback is never reached.

  3. Auto provider resolution lacks constraints: The auxiliary.vision.provider: auto setting resolves to any available model without checking if it meets the 64K minimum context window requirement.

  4. Process-level crash on API errors: Model API errors (429, 401) propagate up to the process level instead of being caught and handled at the session/conversation level.

Suggested Fixes

  1. Implement a circuit breaker: After N consecutive failures with a specific model, stop attempting that model for a cooldown period and route all requests to fallback providers.

  2. Validate auxiliary model selection: When provider: auto resolves a model, validate that it meets the minimum context window requirement. If not, skip it and try the next available model.

  3. Separate gateway health from model health: The gateway process should never crash due to a model API error. Failed model calls should return an error message to the user (e.g., "Model temporarily unavailable") while keeping the gateway running.

  4. Immediate fallback on non-retryable errors: HTTP 4xx errors (401, 403) should skip retries entirely and immediately fall back.

  5. Rate limit backoff: On HTTP 429, implement exponential backoff at the gateway level rather than retrying immediately and crashing.

  6. Official DeepSeek V4 Pro parser: The Hermes parser currently only supports DeepSeek v3/v3.1. Adding a dedicated v4 parser would improve tool calling and response formatting reliability (related to #14902).

Workaround (Current)

The following configuration changes mitigate the issues:

# Use a stable fallback model
fallback_providers:
  - provider: openrouter
    model: z-ai/glm-5.1

# Force auxiliary vision to use a high-context model
auxiliary:
  vision:
    provider: openrouter  # NOT "auto"
    model: google/gemini-2.5-flash

Additionally, adding a personal DeepSeek API key at https://openrouter.ai/settings/integrations provides individual rate limits instead of relying on the shared OpenRouter pool.

Environment

  • Hermes Agent version: latest (as of April 27, 2026)
  • OS: Linux (Ubuntu, systemd)
  • Provider: OpenRouter
  • Model: deepseek/deepseek-v4-pro (published 2026-04-24)
  • Integration: Telegram bot (polling mode)
  • Gateway: hermes-gateway.service (systemd user service)

Related

  • #14902 — Request for official DeepSeek V4 Pro parser support

extent analysis

TL;DR

Implement a circuit breaker pattern to handle model call failures and configure a stable fallback model to prevent gateway crashes.

Guidance

  1. Configure a fallback model: Set fallback_providers to use a stable model like z-ai/glm-5.1 to ensure the gateway remains responsive when the primary model fails.
  2. Validate auxiliary model selection: Manually set auxiliary.vision.provider and auxiliary.vision.model to a high-context model like google/gemini-2.5-flash to avoid context window errors.
  3. Implement rate limit backoff: Introduce exponential backoff on HTTP 429 errors to prevent immediate retries and gateway crashes.
  4. Handle non-retryable errors: Treat HTTP 401/403 errors as non-retryable and trigger fallback immediately.
  5. Add a personal DeepSeek API key: Configure a personal API key at OpenRouter to use individual rate limits instead of shared pool limits.

Example

fallback_providers:
  - provider: openrouter
    model: z-ai/glm-5.1

auxiliary:
  vision:
    provider: openrouter
    model: google/gemini-2.5-flash

Notes

The provided workaround configuration changes can mitigate the issues, but a more robust solution involves implementing a circuit breaker pattern and improving error handling.

Recommendation

Apply the workaround configuration changes and consider implementing a circuit breaker pattern to improve gateway stability and resilience.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING