hermes - ✅(Solved) Fix [Bug] Eager fallback on HTTP 402 (FailoverReason.billing) does not activate; cron job loops on dead primary until output-length crash [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#23138Fetched 2026-05-11 03:30:53
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Participants
Timeline (top)
labeled ×4cross-referenced ×2

When the primary provider returns HTTP 402 (Insufficient Balance), the eager-fallback path at run_agent.py:13503-13527 is not activated. The retry loop also does not advance past attempt 1/3. Instead, downstream code paths produce truncated/empty tool-call arguments (Unrepairable tool_call arguments — replaced with empty object), the model hallucinates oversized completions, hits finish_reason: length, and the cron job dies with RuntimeError: Response truncated due to output length limit.

End user impact: a cron job (Email Mirror — Agent Briefings, schedule */15 * * * *) failed continuously from the moment the DeepSeek balance ran out until the user manually topped up — fallback_providers was configured the whole time and was never used.

Error Message

~/.hermes/logs/gateway.error.log (timestamps trimmed for brevity):

Root Cause

When the primary provider returns HTTP 402 (Insufficient Balance), the eager-fallback path at run_agent.py:13503-13527 is not activated. The retry loop also does not advance past attempt 1/3. Instead, downstream code paths produce truncated/empty tool-call arguments (Unrepairable tool_call arguments — replaced with empty object), the model hallucinates oversized completions, hits finish_reason: length, and the cron job dies with RuntimeError: Response truncated due to output length limit.

End user impact: a cron job (Email Mirror — Agent Briefings, schedule */15 * * * *) failed continuously from the moment the DeepSeek balance ran out until the user manually topped up — fallback_providers was configured the whole time and was never used.

Fix Action

Fix / Workaround

Workaround in use

PR fix notes

PR #23323: fix(agent): bypass credential-pool guard for billing errors in eager fallback

Description (problem / solution / changelog)

Summary

When the primary provider returns HTTP 402 (credit exhaustion), the eager-fallback path is blocked by _pool_may_recover_from_rate_limit() when a multi-entry credential pool exists. This causes the agent to burn through all retries against a depleted provider instead of immediately switching to the configured fallback.

Root cause

The eager-fallback guard at run_agent.py:10667 treats billing (402) and rate-limit (429) errors identically when checking whether the credential pool might recover. For rate limits, this is correct — credential rotation can help. For billing errors, it's wrong — credit exhaustion is an account-level issue that no amount of credential rotation can fix.

The credential pool rotation in _recover_with_credential_pool() (line 10416) already attempts all pool entries before the eager-fallback check runs. If we reach the eager-fallback block, every pool entry is exhausted. But _pool_may_recover_from_rate_limit() checks cooldown-based "availability" — entries whose 1-hour cooldown has expired become "available" again, making the function return True and blocking the fallback indefinitely.

Fix

Bypass the pool-recovery check when classified.reason == FailoverReason.billing. For billing errors, the fallback provider should activate immediately.

Changed files:

  • run_agent.py — 2-line logic change in the eager-fallback guard (+12 lines of comments)
  • tests/agent/test_credential_pool_routing.py — new test test_eager_fallback_fires_for_billing_despite_available_pool

Regression coverage

  • New test: billing error with available pool triggers eager fallback (previously blocked)
  • Existing tests: 429 with active pool still defers (rate-limit behavior unchanged)
  • All 11 tests in test_credential_pool_routing.py pass
  • All 19 tests in test_provider_fallback.py pass
  • All 118 tests in test_error_classifier.py pass

Testing

python -m pytest tests/agent/test_credential_pool_routing.py tests/run_agent/test_provider_fallback.py tests/agent/test_error_classifier.py -v

Fixes [Bug] Eager fallback on HTTP 402 (FailoverReason.billing) does not activate; cron job loops on dead primary until output-length crash #23138

Changed files

  • gateway/platforms/whatsapp.py (modified, +4/-1)
  • run_agent.py (modified, +14/-2)
  • tests/agent/test_credential_pool_routing.py (modified, +24/-0)
  • tests/tools/test_file_tools.py (modified, +37/-0)
  • tools/file_tools.py (modified, +6/-6)

Code Example

model:
  provider: deepseek
  model: deepseek-v4-pro
  base_url: ''
providers: {}
fallback_providers:
- provider: openrouter
  model: openai/gpt-oss-120b:free
- provider: openrouter
  model: z-ai/glm-4.5-air:free

---

{
  "id": "b7fdbe31fc65",
  "name": "Email Mirror — Agent Briefings",
  "model": "deepseek-v4-pro",
  "provider": "deepseek",
  "schedule": {"kind": "cron", "expr": "*/15 * * * *"}
}

---

WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=asyncio_2:… provider=deepseek base_url=https://api.deepseek.com/v1 model=deepseek-v4-pro summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=asyncio_2:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-1_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-3_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-5_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-7_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-9_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance

WARNING run_agent: Tool execute_code returned error (3.41s): {"status": "error", "output": "…stderr: Traceback…"}
WARNING run_agent: Tool execute_code returned error (3.31s):WARNING run_agent: Tool execute_code returned error (3.27s):WARNING run_agent: Unrepairable tool_call arguments for execute_code — replaced with empty object (was: import json, os, pathlib, re, datetime, subprocess, sys, glob, textwrap, traceba)
WARNING run_agent: Unrepairable tool_call arguments for mcp_gmail_send_email — replaced with empty object (was: {"body":"Quelle: Intelligence-Briefing Daily\nJob-ID: d1824b7c405c\nOutput-Datei)

ERROR cron.scheduler: Job 'Email Mirror — Agent Briefings' failed: RuntimeError: Response truncated due to output length limit
Traceback (most recent call last):
  File "/…/cron/scheduler.py", line 1565, in run_job
    raise RuntimeError(_err_text)

---

return result_fn(
    FailoverReason.billing,
    retryable=False,
    should_rotate_credential=True,
    should_fallback=True,
)

---

is_rate_limited = classified.reason in (
    FailoverReason.rate_limit,
    FailoverReason.billing,
)
if is_rate_limited and self._fallback_index < len(self._fallback_chain):
    pool_may_recover = _pool_may_recover_from_rate_limit(
        self._credential_pool, provider=self.provider, base_url=)
    if not pool_may_recover:
        self._emit_status("⚠️ Rate limited — switching to fallback provider...")
        if self._try_activate_fallback(reason=classified.reason):
            retry_count = 0
            continue
RAW_BUFFERClick to expand / collapse

[Bug] Eager fallback on HTTP 402 (FailoverReason.billing) does not activate; cron job loops on dead primary until output-length crash

Summary

When the primary provider returns HTTP 402 (Insufficient Balance), the eager-fallback path at run_agent.py:13503-13527 is not activated. The retry loop also does not advance past attempt 1/3. Instead, downstream code paths produce truncated/empty tool-call arguments (Unrepairable tool_call arguments — replaced with empty object), the model hallucinates oversized completions, hits finish_reason: length, and the cron job dies with RuntimeError: Response truncated due to output length limit.

End user impact: a cron job (Email Mirror — Agent Briefings, schedule */15 * * * *) failed continuously from the moment the DeepSeek balance ran out until the user manually topped up — fallback_providers was configured the whole time and was never used.

Environment

  • Hermes Agent v0.13.0 (2026.5.7) · upstream commit 44cdf555a83c1d8d605d095442e11efd58089533
  • Python 3.11.15
  • OpenAI SDK 2.32.0
  • macOS 14 (Darwin 25.4.0)

Config (~/.hermes/config.yaml, relevant excerpt)

model:
  provider: deepseek
  model: deepseek-v4-pro
  base_url: ''
providers: {}
fallback_providers:
- provider: openrouter
  model: openai/gpt-oss-120b:free
- provider: openrouter
  model: z-ai/glm-4.5-air:free

Failing job

{
  "id": "b7fdbe31fc65",
  "name": "Email Mirror — Agent Briefings",
  "model": "deepseek-v4-pro",
  "provider": "deepseek",
  "schedule": {"kind": "cron", "expr": "*/15 * * * *"}
}

The job pins provider+model explicitly. It runs an agent with multiple tool calls (Gmail MCP, execute_code, memory).

Reproduction

  1. Configure DeepSeek as primary and at least one OpenRouter free model in fallback_providers (as above).
  2. Drain the DeepSeek balance to 0 USD (or block the API key).
  3. Run any cron job that uses provider: deepseek and performs multiple tool calls.
  4. Observe: every API call fails with HTTP 402, no fallback switch occurs, and the agent eventually crashes with Response truncated due to output length limit instead of Insufficient balance.

Observed behavior

~/.hermes/logs/gateway.error.log (timestamps trimmed for brevity):

WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=asyncio_2:… provider=deepseek base_url=https://api.deepseek.com/v1 model=deepseek-v4-pro summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=asyncio_2:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-1_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-3_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-5_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-7_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance
WARNING run_agent: API call failed (attempt 1/3) error_type=APIStatusError thread=ThreadPoolExecutor-9_0:… provider=deepseek …summary=HTTP 402: Insufficient Balance

WARNING run_agent: Tool execute_code returned error (3.41s): {"status": "error", "output": "…stderr: Traceback…"}
WARNING run_agent: Tool execute_code returned error (3.31s): …
WARNING run_agent: Tool execute_code returned error (3.27s): …
WARNING run_agent: Unrepairable tool_call arguments for execute_code — replaced with empty object (was: import json, os, pathlib, re, datetime, subprocess, sys, glob, textwrap, traceba)
WARNING run_agent: Unrepairable tool_call arguments for mcp_gmail_send_email — replaced with empty object (was: {"body":"Quelle: Intelligence-Briefing Daily\nJob-ID: d1824b7c405c\nOutput-Datei)

ERROR cron.scheduler: Job 'Email Mirror — Agent Briefings' failed: RuntimeError: Response truncated due to output length limit
Traceback (most recent call last):
  File "/…/cron/scheduler.py", line 1565, in run_job
    raise RuntimeError(_err_text)

Things that are notably absent from the log:

  • No attempt 2/3 or attempt 3/3 line — every attempt is 1/3
  • No ⚠️ Rate limited — switching to fallback provider... (_emit_status at run_agent.py:13522)
  • No "Credential … (billing) — rotated to pool entry …" log (_recover_with_credential_pool at run_agent.py:7075)
  • No "Fallback skip" warning from _try_activate_fallback

So the code never reached the eager-fallback branch on any of the 7 distinct threads observed.

Expected behavior

Per error_classifier.py:712-738 (_classify_402):

return result_fn(
    FailoverReason.billing,
    retryable=False,
    should_rotate_credential=True,
    should_fallback=True,
)

…and run_agent.py:13503-13527:

is_rate_limited = classified.reason in (
    FailoverReason.rate_limit,
    FailoverReason.billing,
)
if is_rate_limited and self._fallback_index < len(self._fallback_chain):
    pool_may_recover = _pool_may_recover_from_rate_limit(
        self._credential_pool, provider=self.provider, base_url=)
    if not pool_may_recover:
        self._emit_status("⚠️ Rate limited — switching to fallback provider...")
        if self._try_activate_fallback(reason=classified.reason):
            retry_count = 0
            continue

The first 402 should trip FailoverReason.billing, _pool_may_recover_from_rate_limit should return False (no credential pool configured for DeepSeek; pool is None), and the agent should switch to openai/gpt-oss-120b:free on OpenRouter without further retries.

Hypothesis

I have not pinpointed the exact branch that swallows the 402. Three plausible candidates:

  1. _pool_may_recover_from_rate_limit returns True: For DeepSeek there is no explicit credential pool, but if load_pool("deepseek") returns a one-entry pool with the .env key as a single auto-loaded entry, pool.has_available() is True and the function reaches len(pool.entries()) > 1 (returns False) — so this should be safe. Worth confirming whether the loaded pool has 0 or 1 entries in this scenario.

  2. Sub-agent / tool worker loses _fallback_chain: The ThreadPoolExecutor-N_0 threads in the log indicate parallel tool execution (run_agent.py:10584). If any of those workers spin up a separate AIAgent (e.g. via model_tools.py use_model), the new agent may not inherit _fallback_chain even though delegate_tool.py:1067 does. A grep across tool modules for AIAgent( not passing fallback_model= would catch this.

  3. asyncio_2 thread is the gateway/cron entrypoint and the 402 surfaces before the retry loop catches it: The 402 is raised once and propagates out of run_conversation before retry/fallback can run, e.g. during a streaming first-token call where the non-stream retry loop is not yet active. If the API call is in chat_completions non-stream mode, retries should be in run_agent.py:13360+; please confirm which code path the cron scheduler exercises.

Whichever branch is responsible, the user-visible failure mode is the same: a configured fallback chain stays unused, retry counter never increments past 1, and the job dies with a confusing Response truncated error that doesn't mention billing.

Suggested fix directions

  • Tighten _pool_may_recover_from_rate_limit so that for FailoverReason.billing it always returns False (rotating credentials cannot recover an exhausted account-level balance, even with a 2-entry pool — both keys hit the same account).
  • Audit every AIAgent( constructor call outside run_agent.py to ensure fallback_model= is always plumbed through (similar to the recent fix in delegate_tool.py:1102).
  • Surface a clearer terminal error: when an agent dies due to an upstream 402 with no successful API call, the cron last_error should say "billing exhausted on <provider>, fallback chain rejected with <reason>" instead of Response truncated due to output length limit. The current message looks like a model bug, not an account bug.

Related issues

  • #21165 — "401 authentication errors do not trigger fallback provider in auxiliary_client.py" (same shape, different status code)
  • #19411 — "Gateway fallback provider keeps primary model instead of fallback model" (related)
  • #13887 (closed) — "Auxiliary auto fallback fails on OpenRouter 403 credit/key-limit errors" (closed but same pattern)
  • #5220 (closed) — "Provider-side HTTP 402 can kill entire gateway service" (closed but the failure-mode cousin)
  • #11737 — "Multi-provider credential pools for cross-provider failover and rotation" (related feature)

Workaround in use

Pre-run balance check via LaunchAgent that polls https://api.deepseek.com/user/balance every 6h and sends a Telegram warning when balance drops below a configured threshold. This sidesteps the bug but does not fix it.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Per error_classifier.py:712-738 (_classify_402):

return result_fn(
    FailoverReason.billing,
    retryable=False,
    should_rotate_credential=True,
    should_fallback=True,
)

…and run_agent.py:13503-13527:

is_rate_limited = classified.reason in (
    FailoverReason.rate_limit,
    FailoverReason.billing,
)
if is_rate_limited and self._fallback_index < len(self._fallback_chain):
    pool_may_recover = _pool_may_recover_from_rate_limit(
        self._credential_pool, provider=self.provider, base_url=)
    if not pool_may_recover:
        self._emit_status("⚠️ Rate limited — switching to fallback provider...")
        if self._try_activate_fallback(reason=classified.reason):
            retry_count = 0
            continue

The first 402 should trip FailoverReason.billing, _pool_may_recover_from_rate_limit should return False (no credential pool configured for DeepSeek; pool is None), and the agent should switch to openai/gpt-oss-120b:free on OpenRouter without further retries.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING