hermes - 💡(How to fix) Fix Make `max_tokens` configurable per-profile (currently hardcoded to model max, breaks OpenRouter)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Hermes hardcodes max_tokens to each model's maximum output (e.g. 64000 for Claude Sonnet/Haiku 4.5) in agent/anthropic_adapter.py and agent/model_metadata.py. When using OpenRouter as the provider, this triggers HTTP 402 errors even when the account has plenty of credit, because OpenRouter reserves the full requested max_tokens × output rate as collateral before allowing the call.

Error Message

  • Users hit this immediately on first run, with a confusing error pointing to "add more credits" rather than the real cause

Root Cause

  • Hermes works fine with Anthropic direct (their API doesn't pre-reserve credit)
  • It silently breaks for every OpenRouter user with a moderate balance — and OpenRouter is one of the most prominently featured providers
  • Users hit this immediately on first run, with a confusing error pointing to "add more credits" rather than the real cause
  • The fix is one config key + one line of fallback logic

Happy to submit a PR if there's interest.

Fix Action

Fix / Workaround

Current workaround

Manually patch the source after every install/update:

sed -i 's/64000/8192/g' \
  /opt/hermes-agent/agent/anthropic_adapter.py \
  /opt/hermes-agent/agent/model_metadata.py

Code Example

HTTP 402: This request requires more credits, or fewer max_tokens.
You requested up to 64000 tokens, but can only afford 17176.

---

sed -i 's/64000/8192/g' \
  /opt/hermes-agent/agent/anthropic_adapter.py \
  /opt/hermes-agent/agent/model_metadata.py

---

model:
  provider: openrouter
  default: anthropic/claude-haiku-4.5
  max_tokens: 8192  # NEW: per-profile override

---

max_tokens = profile_config.get("model", {}).get("max_tokens") \
             or MODEL_MAX_OUTPUT_TOKENS.get(resolved_model, 8192)

---

version:          0.13.0 (2026.5.7) [fef1a412]
os:               Linux 6.8.0-110-generic x86_64
python:           3.11.15
openai_sdk:       2.32.0
profile:          default
hermes_home:      ~/.hermes
model:            anthropic/claude-haiku-4.5
provider:         (auto)
terminal:         local
api_keys:
  openrouter           set
  anthropic            set
  google/gemini        set
  elevenlabs           set
  (others unset)
features:
  toolsets:           hermes-cli
  mcp_servers:        0
  memory_provider:    built-in
  gateway:            running (systemd (user))
  platforms:          telegram
  cron_jobs:          0
  skills:             106
RAW_BUFFERClick to expand / collapse

Summary

Hermes hardcodes max_tokens to each model's maximum output (e.g. 64000 for Claude Sonnet/Haiku 4.5) in agent/anthropic_adapter.py and agent/model_metadata.py. When using OpenRouter as the provider, this triggers HTTP 402 errors even when the account has plenty of credit, because OpenRouter reserves the full requested max_tokens × output rate as collateral before allowing the call.

Reproduction

  1. Set up a profile with OpenRouter as provider and any Claude 4.x model
  2. Have a moderate but non-massive credit balance (e.g. $5–$25 worth)
  3. Run any prompt: hermes --profile <name> -z "hello"
  4. Result:
HTTP 402: This request requires more credits, or fewer max_tokens.
You requested up to 64000 tokens, but can only afford 17176.

The actual response would be ~50–500 tokens. The 64K is a pre-flight reservation, not real usage. Nothing executes, nothing is billed — Hermes is just unusable until the cap is lowered or the user adds enough credit to cover the worst-case reservation for every call.

Current workaround

Manually patch the source after every install/update:

sed -i 's/64000/8192/g' \
  /opt/hermes-agent/agent/anthropic_adapter.py \
  /opt/hermes-agent/agent/model_metadata.py

This is fragile — hermes update, reinstall, or any version bump silently reverts the fix and breaks the agent again.

Proposed solution

Add max_tokens (or max_output_tokens) as a top-level config key in profile config.yaml, with the current hardcoded value as the fallback default:

model:
  provider: openrouter
  default: anthropic/claude-haiku-4.5
  max_tokens: 8192  # NEW: per-profile override

Then in agent/anthropic_adapter.py, prefer the profile config over the model registry default:

max_tokens = profile_config.get("model", {}).get("max_tokens") \
             or MODEL_MAX_OUTPUT_TOKENS.get(resolved_model, 8192)

This mirrors the existing context-window resolution chain (config override → custom provider per-model → cache → /models endpoint → registry fallback). Extending the same pattern to max_tokens would be architecturally consistent.

Why this matters

  • Hermes works fine with Anthropic direct (their API doesn't pre-reserve credit)
  • It silently breaks for every OpenRouter user with a moderate balance — and OpenRouter is one of the most prominently featured providers
  • Users hit this immediately on first run, with a confusing error pointing to "add more credits" rather than the real cause
  • The fix is one config key + one line of fallback logic

Happy to submit a PR if there's interest.

Environment

version:          0.13.0 (2026.5.7) [fef1a412]
os:               Linux 6.8.0-110-generic x86_64
python:           3.11.15
openai_sdk:       2.32.0
profile:          default
hermes_home:      ~/.hermes
model:            anthropic/claude-haiku-4.5
provider:         (auto)
terminal:         local
api_keys:
  openrouter           set
  anthropic            set
  google/gemini        set
  elevenlabs           set
  (others unset)
features:
  toolsets:           hermes-cli
  mcp_servers:        0
  memory_provider:    built-in
  gateway:            running (systemd (user))
  platforms:          telegram
  cron_jobs:          0
  skills:             106

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING