hermes - ✅(Solved) Fix Bug: Tool call ID not invalidated after 400 error — causes persistent failures across sessions [2 pull requests, 1 participants]

hermes2026-04-27 09:23:27

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#16472•Fetched 2026-04-28 06:53:07

View on GitHub

Comments

Participants

Timeline

Reactions

Author

yunxiujun

Participants

yunxiujun

Timeline (top)

labeled ×4cross-referenced ×2

Error code: 400 - {'type': 'error', 'error': {'type': 'bad_request_error', 'message': 'invalid params, invalid function arguments json string, tool_call_id: call_function_XXXXX_1 (2013)', 'http_code': '400'}

Error Message

When a tool call fails with an HTTP 400 error from MiniMax API (minimax-cn provider), the returned tool_call_id is not properly invalidated. Subsequent new sessions continue to reuse the same invalidated tool_call_id, causing every request to fail with: Error code: 400 - {'type': 'error', 'error': {'type': 'bad_request_error', 'message': 'invalid params, invalid function arguments json string, tool_call_id: call_function_XXXXX_1 (2013)', 'http_code': '400'} 2. Trigger any tool call that produces a 400 error (e.g., malformed parameters) When a tool call returns a 400 error, Hermes should:

Root Cause

Error code: 400 - {'type': 'error', 'error': {'type': 'bad_request_error', 'message': 'invalid params, invalid function arguments json string, tool_call_id: call_function_XXXXX_1 (2013)', 'http_code': '400'}

Fix Action

Fixed

Fixed by PR: fix(agent): invalidate failed tool_call_id on MiniMax 400 errors (https://github.com/NousResearch/hermes-agent/pull/16602)
Fixed by PR: fix(error_classifier): purge stale tool_call_id on 400 to prevent cross-session reuse (#16472) (https://github.com/NousResearch/hermes-agent/pull/16604)

PR fix notes

PR #16602: fix(agent): invalidate failed tool_call_id on MiniMax 400 errors

Repository: NousResearch/hermes-agent
Author: wucm667
State: closed | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/16602

Description (problem / solution / changelog)

Closes #16472

Problem

When MiniMax API (minimax-cn) returns HTTP 400 for invalid tool call arguments, the failed tool_call_id (format: call_function_XXXXX_1) is persisted to the session database. Subsequent sessions inherit these messages with stale tool_call_ids, causing persistent failures until Hermes is restarted.

Changes

File: run_agent.py
Added _invalidate_failed_tool_calls() helper method (lines 3401-3418)
- Only activates for minimax-cn provider
- Detects 400 errors containing tool_call_id or invalid function arguments
- Removes tool_call_id fields from all tool-role messages before persistence
Modified 400 error handling path (line 11721-11722)
- Calls helper before _persist_session() when status code is 400
- Prevents failed tool_call_ids from being saved to session database

Impact

Fixes cross-session tool_call_id pollution for MiniMax provider
No impact on other providers or normal operation
Minimal, focused change — only affects 400 error persistence path

Changed files

run_agent.py (modified, +20/-0)

PR #16604: fix(error_classifier): purge stale tool_call_id on 400 to prevent cross-session reuse (#16472)

Repository: NousResearch/hermes-agent
Author: draix
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/16604

Description (problem / solution / changelog)

Summary

Fixes #16472 — when a provider rejects a tool_call_id with HTTP 400 (most often MiniMax invalid params, invalid function arguments json string, tool_call_id: call_function_X_1 (2013)), Hermes was persisting the broken assistant turn into state.db anyway. Every subsequent session that inherited the same conversation_history replayed the same id and hit the same 400 indefinitely — until the gateway was restarted.

Root cause

run_agent.py (around if status_code == 400 and (approx_tokens > 50000 or len(api_messages) > 80)) only skips persistence when the 400 is a context-overflow shaped failure. A small-session 400 (which is exactly what an invalid tool_call_id produces — short payload, structured error) falls through to _persist_session(messages, conversation_history) and writes the broken assistant turn to state.db.

That turn — assistant message with tool_calls=[{ id: "call_function_X_1", ... }] — is then loaded by every new session that inherits the same conversation history, so the provider keeps seeing the same id and keeps 400-ing. Restarting the gateway is the only way out today.

Reporter evidence:

2026-04-26 → call_function_18464z1gskgy_1 reused across ~5 sessions
2026-04-26 → call_function_bgfd4km3qnvm_1 reused across ~3 sessions
2026-04-27 → call_function_41ojma24zxdp_1 reused across 4+ sessions

Fix

1. Classify the error

agent/error_classifier.py:

New FailoverReason.invalid_tool_call_id.
New ClassifiedError.purge_tool_call_id: Optional[str] carrying the offending id.
New _INVALID_TOOL_CALL_ID_PATTERNS (covers MiniMax invalid function arguments, OpenAI no tool call with id, etc.) plus a tolerant regex extractor extract_invalid_tool_call_id() that handles the various formats providers use (tool_call_id: X, tool_call_id X, 'tool_call_id': 'X', no tool call with id X found).
The branch runs before the generic format_error fallback inside _classify_400, so the cheaper recovery (purge + retry) wins over fallback to a different provider.
A case-preserved companion to error_msg is built once at the top of classify_api_error and threaded through, so case-sensitive ids (e.g. call_function_BgFD4kM3qNvm_1) survive the lowercasing the rest of the classifier relies on.

2. Purge the stale call

run_agent.py:

New static AIAgent._purge_invalid_tool_call_id(messages, invalid_id) walks the messages list and:
- Drops every role=='tool' message whose tool_call_id matches.
- Strips the matching entry from each assistant tool_calls list (handles both dict-style and SDK-object-style entries).
- Drops an assistant turn outright when the bad call was its only call and it had no textual content; otherwise preserves the surviving content and surviving calls.

3. Retry without consuming retry_count

When classified.reason == invalid_tool_call_id and classified.purge_tool_call_id is set, the retry loop:

Purges the id from both messages and api_messages.
Persists the cleaned history immediately so a crash-restart cannot resurrect the bad id (this is the key step the original code missed).
continues without consuming retry_count.

Behavior

Scenario	Before	After
400 `invalid function arguments... tool_call_id: X`	Persist broken turn → next session 400s on same id	Strip id, persist clean state, retry succeeds
400 context overflow	Compress + retry (unchanged)	Compress + retry (unchanged)
400 generic format error	Abort / fallback (unchanged)	Abort / fallback (unchanged)
400 mentions `tool_call_id field missing` (no id)	Generic format_error	Generic format_error (extractor returns None, falls through)

Tests

13 new tests, all green. Existing 2049 tests still pass; the 9 failures in test_bedrock_adapter.py and test_anthropic_adapter.py are pre-existing on main (unrelated to this PR — verified by git stash).

tests/agent/test_error_classifier.py — 6 new:

MiniMax body shape is classified as invalid_tool_call_id
Mixed-case id (call_function_BgFD4kM3qNvm_1) is extracted with original casing
Unrelated 400 (e.g. parameter 'temperature' out of range) keeps format_error
Pattern match without an extractable id falls through to format_error (so a vague mention doesn't cause an empty-purge spin)
OpenAI no tool call with id call_xyz_42 found is classified the same way
invalid_tool_call_id branch wins over context_overflow when the message could match both

tests/run_agent/test_agent_guardrails.py — 7 new:

Drops matching tool_result message
Strips only the bad call when assistant has multiple
Drops assistant turn when the bad call was its only call and content is empty
Keeps assistant turn when the bad call was its only call but content is present
No-op when id is missing from history
Handles SDK-object-style tool_calls (SimpleNamespace(id=..., function=...))
Empty / None inputs are safe

$ pytest tests/agent/test_error_classifier.py tests/run_agent/test_agent_guardrails.py -q
160 passed in 5.03s

Changed files

agent/error_classifier.py (modified, +117/-0)
run_agent.py (modified, +112/-0)
tests/agent/test_error_classifier.py (modified, +119/-0)
tests/run_agent/test_agent_guardrails.py (modified, +107/-0)

Code Example

Error code: 400 - {'type': 'error', 'error': {'type': 'bad_request_error', 'message': 'invalid params, invalid function arguments json string, tool_call_id: call_function_XXXXX_1 (2013)', 'http_code': '400'}

RAW_BUFFERClick to expand / collapse

Description

Error code: 400 - {'type': 'error', 'error': {'type': 'bad_request_error', 'message': 'invalid params, invalid function arguments json string, tool_call_id: call_function_XXXXX_1 (2013)', 'http_code': '400'}

Evidence

The same pattern has reproduced multiple times across different dates:

2026-04-26: call_function_18464z1gskgy_1 → 400, reused across ~5 sessions
2026-04-26: call_function_bgfd4km3qnvm_1 → 400, reused across ~3 sessions
2026-04-27: call_function_41ojma24zxdp_1 → 400, reused across 4+ sessions

Each time a new tool_call_id appears on first failure, it then gets "stuck" and reused in all subsequent sessions until the Hermes service is restarted.

Steps to Reproduce

Use MiniMax provider (minimax-cn) with tools enabled
Trigger any tool call that produces a 400 error (e.g., malformed parameters)
Observe that all subsequent sessions also fail with the same tool_call_id

Expected Behavior

When a tool call returns a 400 error, Hermes should:

Invalidate that tool_call_id
Generate a fresh tool_call_id for subsequent requests
Not persist or cache the failed ID across sessions

Environment

Provider: MiniMax (minimax-cn)
Model: MiniMax-M2.7
Base URL: https://api.minimaxi.com/v1
Platform: Docker

extent analysis

TL;DR

The issue can be mitigated by ensuring that the tool_call_id is properly invalidated and a new one is generated after a 400 error from the MiniMax API.

Guidance

Verify that the tool_call_id invalidation logic is correctly implemented and triggered upon receiving a 400 error from the MiniMax API.
Check the session management code to ensure that the tool_call_id is not being cached or persisted across sessions.
Review the error handling mechanism to guarantee that a fresh tool_call_id is generated for subsequent requests after an error.
Consider adding logging to track the tool_call_id usage and invalidation to better understand the issue.

Example

No code snippet can be provided without more context, but ensuring proper tool_call_id management might involve something like:

if response.status_code == 400:
    invalidate_tool_call_id(current_tool_call_id)
    current_tool_call_id = generate_new_tool_call_id()

Notes

The root cause seems to be related to the tool_call_id management, but without more information about the implementation, it's difficult to provide a definitive fix. The issue might be specific to the MiniMax provider or the Hermes service configuration.

Recommendation

Apply a workaround by manually invalidating the tool_call_id and generating a new one after a 400 error, as this seems to be the most direct way to address the issue given the provided information.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #indexing error #inference speed #output truncation #response parsing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix Bug: Tool call ID not invalidated after 400 error — causes persistent failures across sessions [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #16602: fix(agent): invalidate failed tool_call_id on MiniMax 400 errors

Description (problem / solution / changelog)

Problem

Changes

Impact

Changed files

PR #16604: fix(error_classifier): purge stale tool_call_id on 400 to prevent cross-session reuse (#16472)

Description (problem / solution / changelog)

Summary

Root cause

Fix

1. Classify the error

2. Purge the stale call

3. Retry without consuming retry_count

Behavior

Tests

Changed files

Code Example

Description

Evidence

Steps to Reproduce

Expected Behavior

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING