hermes - ✅(Solved) Fix Bug: Fallback cascade wastes 20-60s on model-output errors (invalid tool call arguments) [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#12770Fetched 2026-04-20 12:17:02
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1referenced ×1

When the API returns a 400 error with message "invalid tool call arguments", Hermes cascades through ALL fallback providers before giving up. This wastes 20-60 seconds per occurrence because the error is a model-output problem (malformed JSON in tool call arguments), not an infrastructure problem that a different provider could solve.

Error Message

When the API returns a 400 error with message "invalid tool call arguments", Hermes cascades through ALL fallback providers before giving up. This wastes 20-60 seconds per occurrence because the error is a model-output problem (malformed JSON in tool call arguments), not an infrastructure problem that a different provider could solve. glm-5.1:cloud → qwen3.5:397b-cloud → mistral-large-3:675b-cloud → glm-5:cloud → kimi-k2.5:cloud → ERROR LLM Provider: All models in this chain are served via Ollama (acting as an OpenAI-compatible proxy on 127.0.0.1:11434). The proxy routes cloud-hosted models (glm-5.1:cloud, qwen3.5:397b-cloud, mistral-large-3:675b-cloud, glm-5:cloud, kimi-k2.5:cloud) through the Ollama API endpoint. The "invalid tool call arguments" 400 error is returned by Ollama's proxy layer when it rejects malformed tool call JSON before forwarding to the upstream cloud LLM.

Root Cause

When the API returns a 400 error with message "invalid tool call arguments", Hermes cascades through ALL fallback providers before giving up. This wastes 20-60 seconds per occurrence because the error is a model-output problem (malformed JSON in tool call arguments), not an infrastructure problem that a different provider could solve.

Fix Action

Fixed

PR fix notes

PR #12777: fix(agent): skip fallback cascade for model-output tool call errors

Description (problem / solution / changelog)

Fixes #12770

Problem

When the API returns a 400 error with messages like "invalid tool call arguments", the error classifier falls through to the generic format_error path with should_fallback=True. This triggers a full cascade through ALL fallback providers (4-5 sequential API calls), wasting 20-60 seconds per occurrence.

The root cause is a model-output problem — the model generated malformed JSON in tool call arguments — not an infrastructure issue. A different provider will produce different (not better) output.

Fix

Added _MODEL_OUTPUT_ERROR_PATTERNS — a pattern list matching common API error messages for malformed tool calls — checked before the generic format_error fallthrough in _classify_400().

When a model-output error is detected:

  • should_fallback = False — stop the cascade immediately
  • retryable = False — the same model will likely produce the same malformed output

Non-tool-related 400 errors continue to allow fallback as before.

Patterns matched

PatternTypical source
invalid tool call argumentsOpenAI / OpenRouter
tool_use block is not valid jsonAnthropic
unterminated stringJSON parse errors
failed to parse toolVarious providers
is not valid jsonGeneric JSON validation
could not parse tool / tool input did not matchProvider-specific

Impact

From the issue report, this pattern was observed 7 times in a single session, each cascade burning 4-5 sequential API calls × ~5-15s = 20-60s wasted per occurrence. With this fix, the error is classified immediately and the cascade is skipped.

Tests

Added 5 regression tests in tests/agent/test_error_classifier.py:

  • test_400_invalid_tool_call_arguments_no_fallback
  • test_400_tool_use_not_valid_json_no_fallback
  • test_400_unterminated_string_no_fallback
  • test_400_failed_to_parse_tool_no_fallback
  • test_400_non_tool_error_still_fallback — confirms non-tool 400s still allow fallback

All 102 tests pass (97 existing + 5 new).

Changed files

  • agent/error_classifier.py (modified, +27/-1)
  • tests/agent/test_error_classifier.py (modified, +67/-0)

Code Example

glm-5.1:cloud → qwen3.5:397b-cloud → mistral-large-3:675b-cloud → glm-5:cloud → kimi-k2.5:cloud → ERROR

---

return result_fn(
    FailoverReason.format_error,
    retryable=False,
    should_fallback=True,   # ← THIS IS THE PROBLEM
)

---

classified.reason not in (
    FailoverReason.rate_limit,
    FailoverReason.billing,
    FailoverReason.overloaded,
    FailoverReason.context_overflow,
    ...
)

---

if any(p in error_msg for p in ["invalid tool call", "tool_call", "function_call"]):
    return result_fn(
        FailoverReason.model_output_error,
        retryable=False,
        should_fallback=False,
    )
RAW_BUFFERClick to expand / collapse

Summary

When the API returns a 400 error with message "invalid tool call arguments", Hermes cascades through ALL fallback providers before giving up. This wastes 20-60 seconds per occurrence because the error is a model-output problem (malformed JSON in tool call arguments), not an infrastructure problem that a different provider could solve.

Observed Behavior

From agent logs, 7 occurrences of this pattern:

glm-5.1:cloud → qwen3.5:397b-cloud → mistral-large-3:675b-cloud → glm-5:cloud → kimi-k2.5:cloud → ERROR

Each cascade = 4-5 sequential API calls, each adding ~5-15s of latency. The final model often generates its own malformed JSON (unterminated strings, invalid escapes in execute_code/terminal tool arguments), making the cascade end in an even worse state.

LLM Provider: All models in this chain are served via Ollama (acting as an OpenAI-compatible proxy on 127.0.0.1:11434). The proxy routes cloud-hosted models (glm-5.1:cloud, qwen3.5:397b-cloud, mistral-large-3:675b-cloud, glm-5:cloud, kimi-k2.5:cloud) through the Ollama API endpoint. The "invalid tool call arguments" 400 error is returned by Ollama's proxy layer when it rejects malformed tool call JSON before forwarding to the upstream cloud LLM.

Root Cause

1. agent/error_classifier.py lines ~623-628

_classify_400() handles all 400 errors. When none match specific patterns (context overflow, model not found, rate limit, billing), it falls through to:

return result_fn(
    FailoverReason.format_error,
    retryable=False,
    should_fallback=True,   # ← THIS IS THE PROBLEM
)

A 400 with "invalid tool call arguments" doesn't match any special case, so it gets should_fallback=True.

2. run_agent.py lines ~10395-10428

The is_client_error check does NOT exclude FailoverReason.format_error from triggering fallback:

classified.reason not in (
    FailoverReason.rate_limit,
    FailoverReason.billing,
    FailoverReason.overloaded,
    FailoverReason.context_overflow,
    ...
)

Since format_error is not excluded, it triggers _try_activate_fallback().

Contrast with correct handling

When the API call succeeds (200 OK) but the response contains malformed JSON, run_agent.py ~L10865-10952 handles this correctly with a retry-with-feedback loop (up to 3 retries), NOT cascading fallbacks.

Proposed Fix

In _classify_400(), detect "invalid tool call" / "tool call arguments" patterns and return a new reason with should_fallback=False:

if any(p in error_msg for p in ["invalid tool call", "tool_call", "function_call"]):
    return result_fn(
        FailoverReason.model_output_error,
        retryable=False,
        should_fallback=False,
    )

Then add FailoverReason.model_output_error to the is_client_error exclusion list.

Environment

  • Hermes Agent v0.10.0, 170 commits behind latest
  • LLM Provider: Ollama (openai-compatible proxy) on 127.0.0.1:11434
  • Primary: glm-5.1:cloud via Ollama proxy
  • Fallback chain: 4 cloud models through same Ollama proxy, 1 local model on 127.0.0.1:8081
  • OS: Ubuntu 6.17.0-22-generic, kernel 6.17.0-22, i7-12700H (20 threads), RTX 3060 Laptop GPU 6GB (driver 590.48.01, CUDA 13.1), 30GB RAM
  • Node: v24.14.0

extent analysis

TL;DR

Modify the _classify_400() function to detect "invalid tool call" patterns and return a reason with should_fallback=False to prevent unnecessary cascading through fallback providers.

Guidance

  • Identify the specific error message patterns that indicate a model-output problem, such as "invalid tool call" or "tool call arguments", and update the _classify_400() function to handle these cases without triggering a fallback.
  • Add a new reason, e.g., FailoverReason.model_output_error, to the exclusion list in the is_client_error check to prevent fallbacks for this specific error type.
  • Verify that the updated _classify_400() function correctly handles the "invalid tool call" error pattern and returns a reason with should_fallback=False.
  • Test the changes with the provided fallback chain to ensure that the cascade is prevented and the error is handled correctly.

Example

if any(p in error_msg for p in ["invalid tool call", "tool_call", "function_call"]):
    return result_fn(
        FailoverReason.model_output_error,
        retryable=False,
        should_fallback=False,
    )

Notes

The proposed fix assumes that the error message patterns are consistent and can be reliably detected. Additional logging or monitoring may be necessary to ensure that the updated function handles all possible error cases correctly.

Recommendation

Apply the proposed workaround by modifying the _classify_400() function and adding the new reason to the exclusion list, as this should prevent unnecessary cascading through fallback providers and improve the overall performance of the system.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING