hermes - ✅(Solved) Fix Bug: Fallback cascade wastes 20-60s on model-output errors (invalid tool call arguments) [1 pull requests, 1 participants]

mikronn2 · 2026-04-20T01:50:20Z

[hermes] When the API returns a 400 error with message "invalid tool call arguments" , Hermes cascades through ALL fallback providers before giving up. This wa… When the API returns a 400 error with message `"invalid tool call arguments"`, Hermes cascades through ALL fallback providers before giving up. This wastes 20-60 seconds per occurrence because the error is a model-output problem (malformed JSON in tool call arguments), not an infrastructure problem that a different provider could solve. # PR #12777: fix(agent): skip fallback cascade for model-output tool call errors - Repository: NousResearch/hermes-agent - Author: voidborne-d - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/12777 ## Description (problem / solution / changelog) Fixes #12770 ## Problem When the API returns a 400 error with messages like `"invalid tool call arguments"`, the error classifier falls through to the generic `format_error` path with `should_fallback=True`. This triggers a full cascade through ALL fallback providers (4-5 sequential API calls), wasting 20-60 seconds per occurrence. The root cause is a **model-output problem** — the model generated malformed JSON in tool call arguments — not an infrastructure issue. A different provider will produce different (not better) output. ## Fix Added `_MODEL_OUTPUT_ERROR_PATTERNS` — a pattern list matching common API error messages for malformed tool calls — checked before the generic `format_error` fallthrough in `_classify_400()`. When a model-output error is detected: - `should_fallback = False` — stop the cascade immediately - `retryable = False` — the same model will likely produce the same malformed output Non-tool-related 400 errors continue to allow fallback as before. ### Patterns matched | Pattern | Typical source | |---|---| | `invalid tool call arguments` | OpenAI / OpenRouter | | `tool_use block is not valid json` | Anthropic | | `unterminated string` | JSON parse errors | | `failed to parse tool` | Various providers | | `is not valid json` | Generic JSON validation | | `could not parse tool` / `tool input did not match` | Provider-specific | ### Impact From the issue report, this pattern was observed 7 times in a single session, each cascade burning 4-5 sequential API calls × ~5-15s = **20-60s wasted per occurrence**. With this fix, the error is classified immediately and the cascade is skipped. ## Tests Added 5 regression tests in `tests/agent/test_error_classifier.py`: - `test_400_invalid_tool_call_arguments_no_fallback` - `test_400_tool_use_not_valid_json_no_fallback` - `test_400_unterminated_string_no_fallback` - `test_400_failed_to_parse_tool_no_fallback` - `test_400_non_tool_error_still_fallback` — confirms non-tool 400s still allow fallback All 102 tests pass (97 existing + 5 new). ## Changed files - `agent/error_classifier.py` (modified, +27/-1) - `tests/agent/test_error_classifier.py` (modified, +67/-0) ## Fixed - Fixed by PR: fix(agent): skip fallback cascade for model-output tool call errors (https://github.com/NousResearch/hermes-agent/pull/12777) ## Summary When the API returns a 400 error with message `"invalid tool call arguments"`, Hermes cascades through ALL fallback providers before giving up. This wastes 20-60 seconds per occurrence because the error is a model-output problem (malformed JSON in tool call arguments), not an infrastructure problem that a different provider could solve. ## Observed Behavior From agent logs, 7 occurrences of this pattern: ``` glm-5.1:cloud → qwen3.5:397b-cloud → mistral-large-3:675b-cloud → glm-5:cloud → kimi-k2.5:cloud → ERROR ``` Each cascade = 4-5 sequential API calls, each adding ~5-15s of latency. The final model often generates its own malformed JSON (unterminated strings, invalid escapes in `execute_code`/`terminal` tool arguments), making the cascade end in an even worse state. **LLM Provider**: All models in this chain are served via **Ollama** (acting as an OpenAI-compatible proxy on `127.0.0.1:11434`). The proxy routes cloud-hosted models (`glm-5.1:cloud`, `qwen3.5:397b-cloud`, `mistral-large-3:675b-cloud`, `glm-5:cloud`, `kimi-k2.5:cloud`) through the Ollama API endpoint. The `"invalid tool call arguments"` 400 error is returned by Ollama's proxy layer when it rejects malformed tool call JSON before forwarding to the upstream cloud LLM. ## Root Cause ### 1. `agent/error_classifier.py` lines ~623-628 `_classify_400()` handles all 400 errors. When none match specific patterns (context overflow, model not found, rate limit, billing), it falls through to: ```python return result_fn( FailoverReason.format_error, retryable=False, should_fallback=True, # ← THIS IS THE PROBLEM ) ``` A 400 with `"invalid tool call arguments"` doesn't match any special case, so it gets `should_fallback=True`. ### 2. `run_agent.py` lines ~10395-10428 The `is_client_error` check does NOT exclude `FailoverReason.format_error` from triggering fallback: ```python classified.reason not in ( FailoverReason

hermes2026-04-20 01:50:20

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#12770•Fetched 2026-04-20 12:17:02

View on GitHub

Comments

Participants

Timeline

Reactions

Author

mikronn2

Participants

mikronn2

Timeline (top)

cross-referenced ×1referenced ×1

Error Message

When the API returns a 400 error with message "invalid tool call arguments", Hermes cascades through ALL fallback providers before giving up. This wastes 20-60 seconds per occurrence because the error is a model-output problem (malformed JSON in tool call arguments), not an infrastructure problem that a different provider could solve. glm-5.1:cloud → qwen3.5:397b-cloud → mistral-large-3:675b-cloud → glm-5:cloud → kimi-k2.5:cloud → ERROR LLM Provider: All models in this chain are served via Ollama (acting as an OpenAI-compatible proxy on 127.0.0.1:11434). The proxy routes cloud-hosted models (glm-5.1:cloud, qwen3.5:397b-cloud, mistral-large-3:675b-cloud, glm-5:cloud, kimi-k2.5:cloud) through the Ollama API endpoint. The "invalid tool call arguments" 400 error is returned by Ollama's proxy layer when it rejects malformed tool call JSON before forwarding to the upstream cloud LLM.

Root Cause

Fix Action

Fixed

Fixed by PR: fix(agent): skip fallback cascade for model-output tool call errors (https://github.com/NousResearch/hermes-agent/pull/12777)

PR fix notes

PR #12777: fix(agent): skip fallback cascade for model-output tool call errors

Repository: NousResearch/hermes-agent
Author: voidborne-d
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/12777

Description (problem / solution / changelog)

Fixes #12770

Problem

When the API returns a 400 error with messages like "invalid tool call arguments", the error classifier falls through to the generic format_error path with should_fallback=True. This triggers a full cascade through ALL fallback providers (4-5 sequential API calls), wasting 20-60 seconds per occurrence.

The root cause is a model-output problem — the model generated malformed JSON in tool call arguments — not an infrastructure issue. A different provider will produce different (not better) output.

Fix

Added _MODEL_OUTPUT_ERROR_PATTERNS — a pattern list matching common API error messages for malformed tool calls — checked before the generic format_error fallthrough in _classify_400().

When a model-output error is detected:

should_fallback = False — stop the cascade immediately
retryable = False — the same model will likely produce the same malformed output

Non-tool-related 400 errors continue to allow fallback as before.

Patterns matched

Pattern	Typical source
`invalid tool call arguments`	OpenAI / OpenRouter
`tool_use block is not valid json`	Anthropic
`unterminated string`	JSON parse errors
`failed to parse tool`	Various providers
`is not valid json`	Generic JSON validation
`could not parse tool` / `tool input did not match`	Provider-specific

Impact

From the issue report, this pattern was observed 7 times in a single session, each cascade burning 4-5 sequential API calls × ~5-15s = 20-60s wasted per occurrence. With this fix, the error is classified immediately and the cascade is skipped.

Tests

Added 5 regression tests in tests/agent/test_error_classifier.py:

test_400_invalid_tool_call_arguments_no_fallback
test_400_tool_use_not_valid_json_no_fallback
test_400_unterminated_string_no_fallback
test_400_failed_to_parse_tool_no_fallback
test_400_non_tool_error_still_fallback — confirms non-tool 400s still allow fallback

All 102 tests pass (97 existing + 5 new).

Changed files

agent/error_classifier.py (modified, +27/-1)
tests/agent/test_error_classifier.py (modified, +67/-0)

Code Example

glm-5.1:cloud → qwen3.5:397b-cloud → mistral-large-3:675b-cloud → glm-5:cloud → kimi-k2.5:cloud → ERROR

---

return result_fn(
    FailoverReason.format_error,
    retryable=False,
    should_fallback=True,   # ← THIS IS THE PROBLEM
)

---

classified.reason not in (
    FailoverReason.rate_limit,
    FailoverReason.billing,
    FailoverReason.overloaded,
    FailoverReason.context_overflow,
    ...
)

---

if any(p in error_msg for p in ["invalid tool call", "tool_call", "function_call"]):
    return result_fn(
        FailoverReason.model_output_error,
        retryable=False,
        should_fallback=False,
    )

RAW_BUFFERClick to expand / collapse

Summary

Observed Behavior

From agent logs, 7 occurrences of this pattern:

glm-5.1:cloud → qwen3.5:397b-cloud → mistral-large-3:675b-cloud → glm-5:cloud → kimi-k2.5:cloud → ERROR

Each cascade = 4-5 sequential API calls, each adding ~5-15s of latency. The final model often generates its own malformed JSON (unterminated strings, invalid escapes in execute_code/terminal tool arguments), making the cascade end in an even worse state.

LLM Provider: All models in this chain are served via Ollama (acting as an OpenAI-compatible proxy on 127.0.0.1:11434). The proxy routes cloud-hosted models (glm-5.1:cloud, qwen3.5:397b-cloud, mistral-large-3:675b-cloud, glm-5:cloud, kimi-k2.5:cloud) through the Ollama API endpoint. The "invalid tool call arguments" 400 error is returned by Ollama's proxy layer when it rejects malformed tool call JSON before forwarding to the upstream cloud LLM.

Root Cause

1. `agent/error_classifier.py` lines ~623-628

_classify_400() handles all 400 errors. When none match specific patterns (context overflow, model not found, rate limit, billing), it falls through to:

return result_fn(
    FailoverReason.format_error,
    retryable=False,
    should_fallback=True,   # ← THIS IS THE PROBLEM
)

A 400 with "invalid tool call arguments" doesn't match any special case, so it gets should_fallback=True.

2. `run_agent.py` lines ~10395-10428

The is_client_error check does NOT exclude FailoverReason.format_error from triggering fallback:

classified.reason not in (
    FailoverReason.rate_limit,
    FailoverReason.billing,
    FailoverReason.overloaded,
    FailoverReason.context_overflow,
    ...
)

Since format_error is not excluded, it triggers _try_activate_fallback().

Contrast with correct handling

When the API call succeeds (200 OK) but the response contains malformed JSON, run_agent.py ~L10865-10952 handles this correctly with a retry-with-feedback loop (up to 3 retries), NOT cascading fallbacks.

Proposed Fix

In _classify_400(), detect "invalid tool call" / "tool call arguments" patterns and return a new reason with should_fallback=False:

if any(p in error_msg for p in ["invalid tool call", "tool_call", "function_call"]):
    return result_fn(
        FailoverReason.model_output_error,
        retryable=False,
        should_fallback=False,
    )

Then add FailoverReason.model_output_error to the is_client_error exclusion list.

Environment

Hermes Agent v0.10.0, 170 commits behind latest
LLM Provider: Ollama (openai-compatible proxy) on 127.0.0.1:11434
Primary: glm-5.1:cloud via Ollama proxy
Fallback chain: 4 cloud models through same Ollama proxy, 1 local model on 127.0.0.1:8081
OS: Ubuntu 6.17.0-22-generic, kernel 6.17.0-22, i7-12700H (20 threads), RTX 3060 Laptop GPU 6GB (driver 590.48.01, CUDA 13.1), 30GB RAM
Node: v24.14.0

extent analysis

TL;DR

Modify the _classify_400() function to detect "invalid tool call" patterns and return a reason with should_fallback=False to prevent unnecessary cascading through fallback providers.

Guidance

Identify the specific error message patterns that indicate a model-output problem, such as "invalid tool call" or "tool call arguments", and update the _classify_400() function to handle these cases without triggering a fallback.
Add a new reason, e.g., FailoverReason.model_output_error, to the exclusion list in the is_client_error check to prevent fallbacks for this specific error type.
Verify that the updated _classify_400() function correctly handles the "invalid tool call" error pattern and returns a reason with should_fallback=False.
Test the changes with the provided fallback chain to ensure that the cascade is prevented and the error is handled correctly.

Example

if any(p in error_msg for p in ["invalid tool call", "tool_call", "function_call"]):
    return result_fn(
        FailoverReason.model_output_error,
        retryable=False,
        should_fallback=False,
    )

Notes

The proposed fix assumes that the error message patterns are consistent and can be reliably detected. Additional logging or monitoring may be necessary to ensure that the updated function handles all possible error cases correctly.

Recommendation

Apply the proposed workaround by modifying the _classify_400() function and adding the new reason to the exclusion list, as this should prevent unnecessary cascading through fallback providers and improve the overall performance of the system.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #model compatibility #GPU setup #container setup #orchestration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix Bug: Fallback cascade wastes 20-60s on model-output errors (invalid tool call arguments) [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #12777: fix(agent): skip fallback cascade for model-output tool call errors

Description (problem / solution / changelog)

Problem

Fix

Patterns matched

Impact

Tests

Changed files

Code Example

Summary

Observed Behavior

Root Cause

1. agent/error_classifier.py lines ~623-628

2. run_agent.py lines ~10395-10428

Contrast with correct handling

Proposed Fix

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

1. `agent/error_classifier.py` lines ~623-628

2. `run_agent.py` lines ~10395-10428