hermes - 💡(How to fix) Fix Hermes mislabels upstream 429 quota exhaustion as missing Codex credentials

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When the primary provider (openai-codex via ChatGPT Plus/Pro OAuth) returns a 429 "usage limit" response (e.g. hit your usage limit. Try again at <reset_time>), Hermes gateway:

  1. Logs WARNING gateway.run: Primary provider auth failed: No Codex credentials stored. Run \hermes auth` to authenticate.` — incorrect; credentials are valid.
  2. Surfaces the same string to the user-facing chat response when no fallback_providers is configured, causing operator panic ("did our tokens leak / get revoked?").
  3. Reports the failed credential in hermes auth list as device_code rate-limited (429) (Xh Ym left) — accurate, but disagrees with the chat-surface message.
  4. When fallback_providers IS configured with an Ollama provider, logs Fallback provider resolved: openrouter model=<X> — labels the resolved provider as "openrouter" even though the config specifies provider: ollama and traffic correctly hits localhost:11434.

Error Message

  1. Observe gateway.log warning + Discord/Slack reply containing the misleading auth-error string. The provider-error handler appears to flatten upstream 4xx/quota errors into a single "no credentials" branch, likely because the user-facing remediation has historically been "re-auth." For OAuth-backed plans, this remediation is wrong when the failure mode is a usage cap rather than missing/expired tokens.
  2. Separate provider-error classification from remediation copy. At minimum:
  • 5xx → Provider error; retrying via fallback. Tested on Hermes 0.14.0 (2026.5.16) + Codex CLI 0.133.0 + ChatGPT Pro plan during a real quota outage window. Fleet of 13 profiles all reproduced the same error string verbatim. Full state captured locally if useful.

Root Cause

Root cause hypothesis

Fix Action

Fix / Workaround

Until #1 lands, every quota exhaustion event triggers operator panic and hermes auth re-runs that cannot succeed (token is fine; quota is not). Mitigation in place locally: 2-tier Ollama fallback chain swallows the impact on user-facing replies, so this is now observability-only severity.

RAW_BUFFERClick to expand / collapse

Hermes mislabels upstream 429 quota exhaustion as missing Codex credentials

Summary

When the primary provider (openai-codex via ChatGPT Plus/Pro OAuth) returns a 429 "usage limit" response (e.g. hit your usage limit. Try again at <reset_time>), Hermes gateway:

  1. Logs WARNING gateway.run: Primary provider auth failed: No Codex credentials stored. Run \hermes auth` to authenticate.` — incorrect; credentials are valid.
  2. Surfaces the same string to the user-facing chat response when no fallback_providers is configured, causing operator panic ("did our tokens leak / get revoked?").
  3. Reports the failed credential in hermes auth list as device_code rate-limited (429) (Xh Ym left) — accurate, but disagrees with the chat-surface message.
  4. When fallback_providers IS configured with an Ollama provider, logs Fallback provider resolved: openrouter model=<X> — labels the resolved provider as "openrouter" even though the config specifies provider: ollama and traffic correctly hits localhost:11434.

Reproduction

  1. Configure openai-codex as primary with a ChatGPT Plus/Pro OAuth credential.
  2. Exhaust the ChatGPT Codex usage quota (sustained agent workload + cron jobs).
  3. Send any user message that triggers an inference call.
  4. Observe gateway.log warning + Discord/Slack reply containing the misleading auth-error string.

Expected vs actual

SurfaceExpectedActual
Chat reply (no fallback)Provider quota exhausted; retry after <reset>No Codex credentials stored. Run hermes auth to authenticate.
Gateway logPrimary provider rate-limited (429); retry-after=<X>s — trying fallbackPrimary provider auth failed: No Codex credentials stored ...
hermes auth list(already correct) device_code rate-limited (429) (Xh Ym left)(matches expected)
Fallback resolution logFallback provider resolved: ollama model=<X>Fallback provider resolved: openrouter model=<X>

Root cause hypothesis

The provider-error handler appears to flatten upstream 4xx/quota errors into a single "no credentials" branch, likely because the user-facing remediation has historically been "re-auth." For OAuth-backed plans, this remediation is wrong when the failure mode is a usage cap rather than missing/expired tokens.

The openrouter label on resolved fallback providers looks like an internal category label leaked into operator-facing logs.

Suggested fix

  1. Separate provider-error classification from remediation copy. At minimum:
    • 401 / 403 → No Codex credentials stored. Run hermes auth.
    • 429 + body containing "usage limit" or retry-after header → Provider quota exhausted; reset at <time>. Surface to chat as a non-alarming degraded-mode notice when a fallback is configured.
    • 5xx → Provider error; retrying via fallback.
  2. In gateway.run's Fallback provider resolved: log line, print the literal provider key from config instead of the resolved category.

Operator impact

Until #1 lands, every quota exhaustion event triggers operator panic and hermes auth re-runs that cannot succeed (token is fine; quota is not). Mitigation in place locally: 2-tier Ollama fallback chain swallows the impact on user-facing replies, so this is now observability-only severity.

Notes for upstream

Tested on Hermes 0.14.0 (2026.5.16) + Codex CLI 0.133.0 + ChatGPT Pro plan during a real quota outage window. Fleet of 13 profiles all reproduced the same error string verbatim. Full state captured locally if useful.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING