hermes - 💡(How to fix) Fix Sonnet 4.6 / Opus 4.7 return generic 429 on Claude Max OAuth while Haiku 4.5 succeeds — appears to be new Anthropic-side enforcement beyond header inspection [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#17169Fetched 2026-04-29 06:36:57
View on GitHub
Comments
2
Participants
2
Timeline
8
Reactions
0
Timeline (top)
labeled ×4commented ×2cross-referenced ×1subscribed ×1

As of 2026-04-28, requests to Sonnet 4.5/4.6 and Opus 4.6/4.7 from Hermes via the Claude Max OAuth credential return HTTP 429 {"type":"error","error":{"type":"rate_limit_error","message":"Error"}} with no anthropic-ratelimit-* response headers. Haiku 4.5 succeeds normally on the same token, same code path, same session.

The real claude CLI (v2.1.122) hits Sonnet successfully with the same OAuth token from the same machine in the same minute. So this is not a quota issue and not a token issue — it's something Anthropic is doing to distinguish real Claude Code from third-party clients on Sonnet/Opus.

Error Message

As of 2026-04-28, requests to Sonnet 4.5/4.6 and Opus 4.6/4.7 from Hermes via the Claude Max OAuth credential return HTTP 429 {"type":"error","error":{"type":"rate_limit_error","message":"Error"}} with no anthropic-ratelimit-* response headers. Haiku 4.5 succeeds normally on the same token, same code path, same session. Result: HTTP 429, body {"type":"error","error":{"type":"rate_limit_error","message":"Error"},"request_id":"req_011CaWyhtmFS2ob5td4b1mmj"}. Response has no anthropic-ratelimit-* headers — only generic Cloudflare headers and x-should-retry: true.

Root Cause

As of 2026-04-28, requests to Sonnet 4.5/4.6 and Opus 4.6/4.7 from Hermes via the Claude Max OAuth credential return HTTP 429 {"type":"error","error":{"type":"rate_limit_error","message":"Error"}} with no anthropic-ratelimit-* response headers. Haiku 4.5 succeeds normally on the same token, same code path, same session.

The real claude CLI (v2.1.122) hits Sonnet successfully with the same OAuth token from the same machine in the same minute. So this is not a quota issue and not a token issue — it's something Anthropic is doing to distinguish real Claude Code from third-party clients on Sonnet/Opus.

Fix Action

Workaround

Switch the primary model to claude-haiku-4-5-20251001 (Haiku is unaffected). Sonnet/Opus can stay in fallback chain but they'll always 429 until this is fixed.

Code Example

TOKEN=$(security find-generic-password -s "Claude Code-credentials" -w | jq -r .claudeAiOauth.accessToken)
curl -sD - -X POST 'https://api.anthropic.com/v1/messages?beta=true' \
  -H "Authorization: Bearer $TOKEN" \
  -H "Accept: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -H "anthropic-beta: claude-code-20250219,oauth-2025-04-20,interleaved-thinking-2025-05-14,context-management-2025-06-27,prompt-caching-scope-2026-01-05,advisor-tool-2026-03-01,advanced-tool-use-2025-11-20,context-1m-2025-08-07,effort-2025-11-24,cache-diagnosis-2026-04-07" \
  -H "anthropic-dangerous-direct-browser-access: true" \
  -H "user-agent: claude-cli/2.1.122 (external, sdk-cli)" \
  -H "x-app: cli" \
  -H "x-claude-code-session-id: $(uuidgen)" \
  -H "x-client-request-id: $(uuidgen)" \
  -H "x-stainless-arch: arm64" \
  -H "x-stainless-lang: js" \
  -H "x-stainless-os: MacOS" \
  -H "x-stainless-package-version: 0.81.0" \
  -H "x-stainless-runtime: node" \
  -H "x-stainless-runtime-version: v24.3.0" \
  -d '{"model":"claude-sonnet-4-6","max_tokens":10,"system":"x","messages":[{"role":"user","content":"hi"}]}'

---

claude -p "hi" --model claude-sonnet-4-6
# → "<friendly assistant reply>"
RAW_BUFFERClick to expand / collapse

Summary

As of 2026-04-28, requests to Sonnet 4.5/4.6 and Opus 4.6/4.7 from Hermes via the Claude Max OAuth credential return HTTP 429 {"type":"error","error":{"type":"rate_limit_error","message":"Error"}} with no anthropic-ratelimit-* response headers. Haiku 4.5 succeeds normally on the same token, same code path, same session.

The real claude CLI (v2.1.122) hits Sonnet successfully with the same OAuth token from the same machine in the same minute. So this is not a quota issue and not a token issue — it's something Anthropic is doing to distinguish real Claude Code from third-party clients on Sonnet/Opus.

Environment

  • Hermes Agent: latest (post hermes update 2026-04-28)
  • macOS arm64
  • Anthropic provider, OAuth subscription token (sk-ant-oat01-…)
  • Account: Claude Max 20x, organizationRateLimitTier default_claude_max_20x, hasExtraUsageEnabled: true
  • Anthropic status page: All Systems Operational at time of testing

Reproduction

Same OAuth token, two requests run within seconds of each other.

Hermes-shaped request (curl) — 429:

TOKEN=$(security find-generic-password -s "Claude Code-credentials" -w | jq -r .claudeAiOauth.accessToken)
curl -sD - -X POST 'https://api.anthropic.com/v1/messages?beta=true' \
  -H "Authorization: Bearer $TOKEN" \
  -H "Accept: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -H "anthropic-beta: claude-code-20250219,oauth-2025-04-20,interleaved-thinking-2025-05-14,context-management-2025-06-27,prompt-caching-scope-2026-01-05,advisor-tool-2026-03-01,advanced-tool-use-2025-11-20,context-1m-2025-08-07,effort-2025-11-24,cache-diagnosis-2026-04-07" \
  -H "anthropic-dangerous-direct-browser-access: true" \
  -H "user-agent: claude-cli/2.1.122 (external, sdk-cli)" \
  -H "x-app: cli" \
  -H "x-claude-code-session-id: $(uuidgen)" \
  -H "x-client-request-id: $(uuidgen)" \
  -H "x-stainless-arch: arm64" \
  -H "x-stainless-lang: js" \
  -H "x-stainless-os: MacOS" \
  -H "x-stainless-package-version: 0.81.0" \
  -H "x-stainless-runtime: node" \
  -H "x-stainless-runtime-version: v24.3.0" \
  -d '{"model":"claude-sonnet-4-6","max_tokens":10,"system":"x","messages":[{"role":"user","content":"hi"}]}'

Result: HTTP 429, body {"type":"error","error":{"type":"rate_limit_error","message":"Error"},"request_id":"req_011CaWyhtmFS2ob5td4b1mmj"}. Response has no anthropic-ratelimit-* headers — only generic Cloudflare headers and x-should-retry: true.

Real claude CLI — 200:

claude -p "hi" --model claude-sonnet-4-6
# → "<friendly assistant reply>"

Same token. Same model. Same machine. Within seconds.

The successful response shows the account has plenty of quota:

  • anthropic-ratelimit-unified-5h-utilization: 0.09
  • anthropic-ratelimit-unified-7d-utilization: 0.16
  • anthropic-ratelimit-unified-7d_sonnet-utilization: 0.20
  • anthropic-ratelimit-unified-overage-disabled-reason: org_level_disabled_until
  • anthropic-ratelimit-unified-overage-status: rejected

So overage is disabled at org level (which is fine — base quota is 80% available), but the underlying gate is something else.

What's identical between real-claude and Hermes-spoofed

  • Same OAuth token
  • Same ?beta=true URL
  • All anthropic-beta values match
  • anthropic-dangerous-direct-browser-access: true
  • user-agent: claude-cli/2.1.122 (external, sdk-cli)
  • All x-stainless-* values match (arch, lang, os, package-version 0.81.0, runtime, runtime-version v24.3.0)
  • x-app: cli
  • Synthetic x-client-request-id and x-claude-code-session-id UUIDs

What's different

Things real claude sends that Hermes doesn't:

  1. Body shape: real claude includes metadata, output_config, thinking, context_management, diagnostics top-level fields. Hermes sends only model, messages, system, tools, max_tokens.
  2. TLS fingerprint: curl/openssl vs Bun/Node — different JA3/JA4 likely.
  3. Streaming: real claude uses stream: true always.

Hypothesis

Anthropic added a new enforcement layer for Sonnet/Opus on subscription OAuth, separate from the existing prompt-text content filter. It probably keys on either TLS fingerprint or required body fields (most likely the structured metadata.user_id / output_config / context_management fields that real Claude Code adds).

Affected

  • Hermes Anthropic native provider (agent/anthropic_adapter.py's build_anthropic_client)
  • Any user on Claude Max / Claude Pro OAuth selecting Sonnet 4.5+ or Opus 4.6+ as primary or fallback
  • Telegram, Discord, webui, api_server — all platforms

Workaround

Switch the primary model to claude-haiku-4-5-20251001 (Haiku is unaffected). Sonnet/Opus can stay in fallback chain but they'll always 429 until this is fixed.

What might fix it

  1. Have build_anthropic_client always emit the body fields that real Claude Code emits: metadata: {user_id: <hashed-account-uuid>}, output_config: {...}, thinking: {type: "adaptive"}, context_management: {...}, plus stream: true by default.
  2. If the gate is TLS-level, the SDK already uses Node's https stack — should match. But if Anthropic is fingerprinting handshake details specific to Bun, that's harder.

Happy to provide gateway logs, full request dumps, or run additional diagnostics. Multiple request_ids above can be cross-referenced server-side.

extent analysis

TL;DR

Modify the Hermes request to include the missing body fields that the real Claude CLI sends, such as metadata, output_config, thinking, and context_management, to potentially bypass the rate limit error.

Guidance

  1. Verify the hypothesis: Test the Hermes request with the additional body fields to see if it resolves the rate limit error.
  2. Compare request differences: Review the request headers and bodies of both the Hermes and real Claude CLI requests to identify any other potential differences that could be contributing to the issue.
  3. Check Anthropic documentation: Look for any updates or changes to the Anthropic API that may require additional fields or headers in requests.
  4. Test with stream: true: Try setting stream: true in the Hermes request to see if it makes a difference, as the real Claude CLI always uses this setting.
  5. Monitor response headers: Check the response headers for any clues about what might be causing the rate limit error, such as the anthropic-ratelimit-* headers.

Example

curl -sD - -X POST 'https://api.anthropic.com/v1/messages?beta=true' \
  -H "Authorization: Bearer $TOKEN" \
  -H "Accept: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -H "anthropic-beta: claude-code-20250219,oauth-2025-04-20,interleaved-thinking-2025-05-14,context-management-2025-06-27,prompt-caching-scope-2026-01-05,advisor-tool-2026-03-01,advanced-tool-use-2025-11-20,context-1m-2025-08-07,effort-2025-11-24,cache-diagnosis-2026-04-07

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Sonnet 4.6 / Opus 4.7 return generic 429 on Claude Max OAuth while Haiku 4.5 succeeds — appears to be new Anthropic-side enforcement beyond header inspection [2 comments, 2 participants]