hermes - ✅(Solved) Fix [Bug]: Anthropic max_tokens fallback only fires for OpenRouter/Nous — silently skipped for Bedrock, NVIDIA, LiteLLM, and every other chat-completions proxy [1 pull requests, 1 comments, 2 participants]

hermes2026-04-20 03:02:37

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#12790•Fetched 2026-04-20 12:16:57

View on GitHub

Comments

Participants

Timeline

Reactions

Author

yuanqingz

Participants

vominh1919

yuanqingz

Timeline (top)

closed ×1commented ×1cross-referenced ×1referenced ×1

Fix Action

Fix / Workaround

AWS Bedrock historically defaults to 4096 output tokens. With a Claude model, reasoning tokens plus a single large tool call (e.g. write_file with a full file, patch with a multi-hunk diff) easily exceed that. The request comes back with finish_reason='length' → 3 continuation retries → rollback → the user sees:

Ask the agent to do anything that requires a large tool call — e.g. “write a 500-line Python script to do X” or “patch this 200-line function”.

PR fix notes

PR #12811: fix: Anthropic max_tokens + Feishu channel_prompt + Discord free_response

Repository: NousResearch/hermes-agent
Author: vominh1919
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/12811

Description (problem / solution / changelog)

Fixes #12790: Anthropic max_tokens fallback for all proxies

The max_tokens fallback in _build_api_kwargs() only fired when the base URL was OpenRouter or Nous Portal. Any other chat-completions proxy (AWS Bedrock, NVIDIA, LiteLLM, vLLM, corporate gateways) serving Claude models would silently bypass the fallback, shipping requests without max_tokens set. Proxies like AWS Bedrock default to 4096 output tokens, which easily exhausts on thinking tokens + large tool calls like write_file or patch.

Fix: Changed the condition from URL-gated to model-gated. If the model is in _ANTHROPIC_OUTPUT_LIMITS (Claude, MiniMax), max_tokens is always set regardless of which proxy serves it.

Fixes #12805: Feishu adapter missing channel_prompt resolution

The Feishu platform adapter did not implement per-channel prompt resolution (_resolve_channel_prompt), unlike Discord and Slack which both support this feature. This means channel_prompts config in config.yaml was silently ignored for Feishu.

Fix: Added _resolve_channel_prompt() method matching the Discord/Slack pattern. Passes channel_prompt to all three MessageEvent construction sites (main message, reaction routing, card action routing).

Fixes #12750: Discord free_response_channels not working

The on_message handler checked DISCORD_IGNORE_NO_MENTION (default: true) before _handle_message was called. When a message had no @mention (human or bot), it returned early — never reaching the free_response_channels logic inside _handle_message. This meant free response channels could never receive unmentioned messages.

Fix: In the DISCORD_IGNORE_NO_MENTION gate at on_message, check the channel against DISCORD_FREE_RESPONSE_CHANNELS before returning early. Configured channels are exempted from the ignore-no-mention filter.

Discussed in: #12790, #12805, #12750

Changed files

gateway/platforms/discord.py (modified, +8/-2)
gateway/platforms/feishu.py (modified, +8/-0)
run_agent.py (modified, +12/-11)

Code Example

# ~/.hermes/config.yaml
   model:
     model: aws/anthropic/bedrock-claude-opus-4-7
     provider: custom  # base_url: NVIDIA inference API (Bedrock-backed)

---

# run_agent.py, inside _build_api_kwargs
elif (self._is_openrouter_url() or "nousresearch" in self._base_url_lower) \
        and "claude" in (self.model or "").lower():
    # inject max_tokens...

RAW_BUFFERClick to expand / collapse

Bug Description

AIAgent._build_api_kwargs() in run_agent.py only injects an explicit max_tokens for Anthropic-compatible models (Claude, MiniMax) when the request is served through OpenRouter or Nous Portal. Every other chat-completions proxy silently bypasses the fallback, shipping requests without max_tokens set, and lets the upstream proxy pick its own default.

⚠️ Response truncated due to output length limit

…on what should have been a simple response. Same class of problem applies to any non-OpenRouter / non-Nous proxy that serves Claude or other Anthropic-compatible models: NVIDIA inference API, self-hosted LiteLLM, vLLM, corporate gateways, etc.

Steps to Reproduce

Configure Hermes with a chat-completions provider whose base_url is not openrouter.ai and not nousresearch.com, serving an Anthropic-compatible model. Concrete example (my setup):
```
# ~/.hermes/config.yaml
model:
  model: aws/anthropic/bedrock-claude-opus-4-7
  provider: custom  # base_url: NVIDIA inference API (Bedrock-backed)
```
Ask the agent to do anything that requires a large tool call — e.g. “write a 500-line Python script to do X” or “patch this 200-line function”.
Observe repeated ⚠️ Response truncated due to output length limit warnings on what should be straightforward tool calls, followed by retries and rollbacks.

Expected Behavior

If the model is Anthropic-compatible (lookup in _ANTHROPIC_OUTPUT_LIMITS), Hermes should send its documented output limit as max_tokens regardless of which proxy URL is serving it. The proxy choice is orthogonal to the upstream API contract — the Messages API requires max_tokens as a mandatory field, period.

Actual Behavior

Hermes only injects max_tokens when _is_openrouter_url() or "nousresearch" in _base_url_lower. For any other proxy, max_tokens is left unset → proxy picks a default (Bedrock: 4096) → truncation and retry storm.

Relevant code (current main):

# run_agent.py, inside _build_api_kwargs
elif (self._is_openrouter_url() or "nousresearch" in self._base_url_lower) \
        and "claude" in (self.model or "").lower():
    # inject max_tokens...

The base_url gate silently excludes every other Anthropic-compat proxy. The "claude" in model substring also misses non-Claude Anthropic-compatible models (e.g. MiniMax) even when they're served through OpenRouter.

Affected Component

Agent Core (conversation loop, build_api_kwargs)

Scope of impact

Anyone serving Anthropic-compatible models through a non-OpenRouter / non-Nous proxy:

AWS Bedrock (Claude family) — most common trigger
NVIDIA inference API with Anthropic models
Self-hosted LiteLLM / vLLM routing to Anthropic
Corporate gateways / reverse proxies
Any custom_providers entry hitting an Anthropic-compatible endpoint

Hermes' native Anthropic path (api_mode='anthropic_messages') is unaffected — it doesn't reach this code branch.

Proposed Fix

Gate on the model instead of the base_url. If the model matches a known entry in _ANTHROPIC_OUTPUT_LIMITS, inject max_tokens regardless of the proxy URL. This is table-driven, not substring-driven, so it correctly covers non-Claude Anthropic-compatible models without hardcoding family names.

PR with the fix + regression tests: #12767 (fix: apply Anthropic max_tokens fallback to all chat_completions proxies).

The PR adds:

_matches_anthropic_compatible_model() helper (proxy-agnostic lookup)
5 new TestBuildApiKwargs test cases guarding:
- existing OpenRouter+Claude behaviour (regression guard)
- custom proxy + Claude (NVIDIA Bedrock scenario)
- custom proxy + MiniMax (non-Claude Anthropic-compat)
- unknown models must NOT receive the fallback (negative test)
- meta-test: the condition must not hardcode the "claude" substring

Environment

Hermes main model: aws/anthropic/bedrock-claude-opus-4-7 via custom provider (NVIDIA inference API / Bedrock-backed)
Observed in both CLI and Feishu gateway sessions
Python 3.11, Linux

extent analysis

TL;DR

The issue can be fixed by modifying the _build_api_kwargs method to inject max_tokens based on the model instead of the proxy URL.

Guidance

Identify the models that are Anthropic-compatible by checking the _ANTHROPIC_OUTPUT_LIMITS dictionary.
Create a helper function _matches_anthropic_compatible_model() to perform a proxy-agnostic lookup of the model.
Modify the _build_api_kwargs method to inject max_tokens if the model matches a known entry in _ANTHROPIC_OUTPUT_LIMITS, regardless of the proxy URL.
Add regression tests to ensure the fix works correctly for different scenarios, including custom proxies and non-Claude Anthropic-compatible models.

Example

def _matches_anthropic_compatible_model(self):
    return self.model in _ANTHROPIC_OUTPUT_LIMITS

def _build_api_kwargs(self):
    # ...
    if self._matches_anthropic_compatible_model():
        # inject max_tokens
        kwargs['max_tokens'] = _ANTHROPIC_OUTPUT_LIMITS[self.model]
    # ...

Notes

The proposed fix is to gate on the model instead of the base URL, which should correctly cover non-Claude Anthropic-compatible models without hardcoding family names. The fix should be applied to the run_agent.py file, and regression tests should be added to ensure the fix works correctly.

Recommendation

Apply the workaround by modifying the _build_api_kwargs method to inject max_tokens based on the model, as described in the proposed fix. This should resolve the issue for users serving Anthropic-compatible models through non-OpenRouter/non-Nous proxies.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #request error #file not found #serialization error #model compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - ✅(Solved) Fix [Bug]: Anthropic max_tokens fallback only fires for OpenRouter/Nous — silently skipped for Bedrock, NVIDIA, LiteLLM, and every other chat-completions proxy [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

PR fix notes

PR #12811: fix: Anthropic max_tokens + Feishu channel_prompt + Discord free_response

Description (problem / solution / changelog)

Fixes #12790: Anthropic max_tokens fallback for all proxies

Fixes #12805: Feishu adapter missing channel_prompt resolution

Fixes #12750: Discord free_response_channels not working

Changed files

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Scope of impact

Proposed Fix

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - ✅(Solved) Fix [Bug]: Anthropic max_tokens fallback only fires for OpenRouter/Nous — silently skipped for Bedrock, NVIDIA, LiteLLM, and every other chat-completions proxy [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

PR fix notes

PR #12811: fix: Anthropic max_tokens + Feishu channel_prompt + Discord free_response

Description (problem / solution / changelog)

Fixes #12790: Anthropic max_tokens fallback for all proxies

Fixes #12805: Feishu adapter missing channel_prompt resolution

Fixes #12750: Discord free_response_channels not working

Changed files

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Scope of impact

Proposed Fix

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING