hermes - 💡(How to fix) Fix [Bug]: HTTP 402 (billing/spend limit) misclassified and misleading advice given

hermes2026-05-26 05:30:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

When a provider returns HTTP 402 (API key spend limit exceeded), Hermes shows two inconsistent and incorrect behaviours:

The first failure misclassifies 402 as a rate limit
The second failure (fallback) correctly identifies it as non-retryable but gives wrong advice, suggesting the API key may be invalid

The messaging gateway (Telegram, etc.) then suppresses all detail and forwards only the first (wrong) classification to the user.

Error Message

⚠️ API call failed (attempt 1/3): APIStatusError [HTTP 402] 🔌 Provider: custom Model: kimi-k2-6 📝 Error: HTTP 402: Error code: 402 - {'error': 'API key USD spend limit exceeded. Your account may still have USD balance, but this API key has reached its configured USD spending limit.'} ⚠️ Rate limited — switching to fallback provider... ← WRONG: this is billing, not rate limiting 🔄 Primary model failed — switching to fallback: venice-uncensored-1-2 via custom

⚠️ API call failed (attempt 1/3): APIStatusError [HTTP 402] 🔌 Provider: custom Model: venice-uncensored-1-2 📝 Error: HTTP 402: Error code: 402 - {'error': 'API key USD spend limit exceeded. ...'} ⚠️ Non-retryable error (HTTP 402) — trying fallback... ← Correct classification this time ❌ Non-retryable client error (HTTP 402). Aborting. 💡 Your API key was rejected by the provider. Check: • Is the key valid? Run: hermes setup ← WRONG: key is valid, spend limit is the issue • Does your account have access to venice-uncensored-1-2?

Root Cause

All 402 responses should be consistently classified as billing errors, not rate limits
User-facing message should identify the cause: spend/billing limit, not rate limiting
Remediation advice should point to provider billing settings, not key validity
The gateway should either surface the billing-specific message or at minimum not say "rate limiting" when it knows the reason is billing

Fix Action

Fix / Workaround

The internal classifier already correctly identifies `reason=billing` for HTTP 402
responses (visible in debug logs: `Error classified: reason=billing status=402
retryable=False`). The fix is to branch on this reason when selecting status messages,
likely in `agent/conversation_loop.py` where fallback status callbacks are dispatched:

Code Example

⚠️  API call failed (attempt 1/3): APIStatusError [HTTP 402]
   🔌 Provider: custom  Model: kimi-k2-6
   📝 Error: HTTP 402: Error code: 402 - {'error': 'API key USD spend limit exceeded.
       Your account may still have USD balance, but this API key has reached its
       configured USD spending limit.'}
⚠️ Rate limited — switching to fallback provider...         ← WRONG: this is billing, not rate limiting
🔄 Primary model failed — switching to fallback: venice-uncensored-1-2 via custom

⚠️  API call failed (attempt 1/3): APIStatusError [HTTP 402]
   🔌 Provider: custom  Model: venice-uncensored-1-2
   📝 Error: HTTP 402: Error code: 402 - {'error': 'API key USD spend limit exceeded. ...'}
⚠️ Non-retryable error (HTTP 402) — trying fallback...      ← Correct classification this time
❌ Non-retryable client error (HTTP 402). Aborting.
   💡 Your API key was rejected by the provider. Check:
      • Is the key valid? Run: hermes setup                 ← WRONG: key is valid, spend limit is the issue
      • Does your account have access to venice-uncensored-1-2?

---

⚠️ Rate limited — switching to fallback provider...
⏱️ The model provider is rate-limiting requests. Please wait a moment and try again.

---

DEBUG agent.conversation_loop: Error classified: reason=billing status=402
      retryable=False compress=False rotate=True fallback=True

---

Report       https://paste.rs/E3r7y
agent.log    https://paste.rs/f9ATg
gateway.log  https://paste.rs/u8InS

---



---

The internal classifier already correctly identifies `reason=billing` for HTTP 402
responses (visible in debug logs: `Error classified: reason=billing status=402
retryable=False`). The fix is to branch on this reason when selecting status messages,
likely in `agent/conversation_loop.py` where fallback status callbacks are dispatched:

1. When `reason=billing`, the fallback status message should say something like
   "⚠️ Billing limit reached — switching to fallback..." instead of "⚠️ Rate limited
   — switching to fallback..."

2. Retry wait messages ("⏱️ Rate limited. Waiting Xs...") should not fire for
   `retryable=False` billing errors.

3. The final failure hint should suggest checking provider billing/spend limits
   rather than validating the API key.

The `reason` field is already plumbed through the classifier — the fix should only
require adding a `billing` branch alongside the existing `rate_limit` branch in the
status message lookup. No changes to error classification logic needed.

RAW_BUFFERClick to expand / collapse

Bug Description

Bug: HTTP 402 (billing/spend limit) misclassified and misleading advice given

Summary

When a provider returns HTTP 402 (API key spend limit exceeded), Hermes shows two inconsistent and incorrect behaviours:

The first failure misclassifies 402 as a rate limit
The second failure (fallback) correctly identifies it as non-retryable but gives wrong advice, suggesting the API key may be invalid

The messaging gateway (Telegram, etc.) then suppresses all detail and forwards only the first (wrong) classification to the user.

Observed Behaviour

CLI output

⚠️  API call failed (attempt 1/3): APIStatusError [HTTP 402]
   🔌 Provider: custom  Model: kimi-k2-6
   📝 Error: HTTP 402: Error code: 402 - {'error': 'API key USD spend limit exceeded.
       Your account may still have USD balance, but this API key has reached its
       configured USD spending limit.'}
⚠️ Rate limited — switching to fallback provider...         ← WRONG: this is billing, not rate limiting
🔄 Primary model failed — switching to fallback: venice-uncensored-1-2 via custom

⚠️  API call failed (attempt 1/3): APIStatusError [HTTP 402]
   🔌 Provider: custom  Model: venice-uncensored-1-2
   📝 Error: HTTP 402: Error code: 402 - {'error': 'API key USD spend limit exceeded. ...'}
⚠️ Non-retryable error (HTTP 402) — trying fallback...      ← Correct classification this time
❌ Non-retryable client error (HTTP 402). Aborting.
   💡 Your API key was rejected by the provider. Check:
      • Is the key valid? Run: hermes setup                 ← WRONG: key is valid, spend limit is the issue
      • Does your account have access to venice-uncensored-1-2?

Messaging gateway (Telegram) — what the user sees

⚠️ Rate limited — switching to fallback provider...
⏱️ The model provider is rate-limiting requests. Please wait a moment and try again.

All provider detail is suppressed; user only receives the incorrect first classification.

Gateway log (DEBUG)

DEBUG agent.conversation_loop: Error classified: reason=billing status=402
      retryable=False compress=False rotate=True fallback=True

The internal classifier correctly identifies reason=billing and retryable=False, but this does not propagate to the user-facing message.

Issues

Issue 1: First 402 classified as rate limit

After the first model returns 402, Hermes emits ⚠️ Rate limited — switching to fallback provider.... The error message from the provider explicitly states 'API key USD spend limit exceeded' — this is a billing block, not a rate limit. The internal classifier agrees (reason=billing) but the status message contradicts it.

Issue 2: Inconsistent 402 handling between first and fallback attempts

The first attempt says "rate limited"; the fallback attempt correctly says "non-retryable error". The same status code is classified differently depending on which attempt it is.

Issue 3: Wrong remediation advice for 402

The final error message suggests checking whether the API key is valid (Is the key valid? Run: hermes setup). For a spend limit error the key is valid — the correct advice is to raise the key's spend limit in the provider dashboard, or top up the account balance.

Issue 4: Gateway suppresses detail, forwards wrong classification

Messaging gateway users receive only the first (incorrect) classification. A user with no CLI access has no way to discover the real cause without inspecting gateway logs.

Environment

Hermes gateway and CLI
Provider: Venice AI (custom OpenAI-compatible endpoint, https://api.venice.ai/api/v1)
Models: kimi-k2-6 (primary), venice-uncensored-1-2 (fallback)
Platform: Telegram gateway + direct CLI (hermes chat)

Additional Context

The internal error classification (reason=billing, retryable=False) is already correct, which suggests the fix is isolated to the user-facing message lookup — the 402/billing case is likely falling through to the rate-limit message template rather than having its own.

The Hermes troubleshooting skill (research-paper-writing/SKILL.md) groups 402 and 429 together under "API rate limit / credit exhaustion", which may indicate this conflation is intentional but incorrect across the codebase.

Steps to Reproduce

Configure a Venice AI custom provider with an API key that has a per-key USD spend limit set
Exhaust the spend limit on that key, or close to it (account balance may still be positive)
Send any message — via CLI (hermes chat -q "...") or a messaging gateway, Telegram in my case

Expected Behavior

All 402 responses should be consistently classified as billing errors, not rate limits
User-facing message should identify the cause: spend/billing limit, not rate limiting
Remediation advice should point to provider billing settings, not key validity
The gateway should either surface the billing-specific message or at minimum not say "rate limiting" when it knows the reason is billing

Actual Behavior

402 response is displayed as being 429, rate limited.

Affected Component

Gateway (Telegram/Discord/Slack/WhatsApp)

Messaging Platform (if gateway-related)

No response

Debug Report

Report       https://paste.rs/E3r7y
agent.log    https://paste.rs/f9ATg
gateway.log  https://paste.rs/u8InS

Operating System

Ubuntu 24.04

Python Version

Python 3.12.3 / Python: 3.11.15 (reported by Hermes Agent)

Hermes Version

Hermes Agent v0.14.0 (2026.5.16)

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

No response

Proposed Fix (optional)

Here's what to paste in that field:

The internal classifier already correctly identifies `reason=billing` for HTTP 402
responses (visible in debug logs: `Error classified: reason=billing status=402
retryable=False`). The fix is to branch on this reason when selecting status messages,
likely in `agent/conversation_loop.py` where fallback status callbacks are dispatched:

1. When `reason=billing`, the fallback status message should say something like
   "⚠️ Billing limit reached — switching to fallback..." instead of "⚠️ Rate limited
   — switching to fallback..."

2. Retry wait messages ("⏱️ Rate limited. Waiting Xs...") should not fire for
   `retryable=False` billing errors.

3. The final failure hint should suggest checking provider billing/spend limits
   rather than validating the API key.

The `reason` field is already plumbed through the classifier — the fix should only
require adding a `billing` branch alongside the existing `rate_limit` branch in the
status message lookup. No changes to error classification logic needed.

Are you willing to submit a PR for this?

I'd like to fix this myself and submit a PR

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.