hermes - 💡(How to fix) Fix [bug] same_tool_failure_warning does not auto-escalate to block; allows unbounded retry-loop cost amplification

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

▎ 1. MCP server transient failure → 42 wasted spawns when a tool returned a credit/billing error. ▎ 2. MCP server returning a deterministic 422 (validation error) → 57-turn loop, ~$6 in Sonnet 4.6 input credits burned, only halted because the Anthropic account ran out of credits and ▎ error, schema mismatch). For transient failures (5xx, timeout, network) preserve current retry behavior but add exponential backoff. Cost-amplification incidents like the above are

Root Cause

The same_tool_failure_warning mechanism (visible in tool_result content as [Tool loop warning: same_tool_failure_warning; count=N; <tool> has failed N times this turn. ...]) ▎ increments a counter but does not auto-block the task when N exceeds a reasonable threshold. The warning text actively encourages continued tool use ("Do not switch to text-only ▎ replies; keep using tools, but diagnose before retrying"), and frontier models often interpret this as license to retry the same failing call with minor argument tweaks. ▎ ▎ We observed two production loops driven by this: ▎ ▎ 1. MCP server transient failure → 42 wasted spawns when a tool returned a credit/billing error. ▎ 2. MCP server returning a deterministic 422 (validation error) → 57-turn loop, ~$6 in Sonnet 4.6 input credits burned, only halted because the Anthropic account ran out of credits and ▎ returned 400. ▎ ▎ In both cases the loop continued well past the point where a human would have escalated. The same_tool_failure_warning counter reached values like 3, 4, 5+ without triggering an ▎ auto-block. ▎ ▎ Proposed: auto-kanban_block (or equivalent escalation) when same_tool_failure_warning count >= configurable threshold (default 3-5) on a deterministic-looking failure (4xx, validation ▎ error, schema mismatch). For transient failures (5xx, timeout, network) preserve current retry behavior but add exponential backoff. Cost-amplification incidents like the above are ▎ silent until the bill arrives.

RAW_BUFFERClick to expand / collapse

The same_tool_failure_warning mechanism (visible in tool_result content as [Tool loop warning: same_tool_failure_warning; count=N; <tool> has failed N times this turn. ...]) ▎ increments a counter but does not auto-block the task when N exceeds a reasonable threshold. The warning text actively encourages continued tool use ("Do not switch to text-only ▎ replies; keep using tools, but diagnose before retrying"), and frontier models often interpret this as license to retry the same failing call with minor argument tweaks. ▎ ▎ We observed two production loops driven by this: ▎ ▎ 1. MCP server transient failure → 42 wasted spawns when a tool returned a credit/billing error. ▎ 2. MCP server returning a deterministic 422 (validation error) → 57-turn loop, ~$6 in Sonnet 4.6 input credits burned, only halted because the Anthropic account ran out of credits and ▎ returned 400. ▎ ▎ In both cases the loop continued well past the point where a human would have escalated. The same_tool_failure_warning counter reached values like 3, 4, 5+ without triggering an ▎ auto-block. ▎ ▎ Proposed: auto-kanban_block (or equivalent escalation) when same_tool_failure_warning count >= configurable threshold (default 3-5) on a deterministic-looking failure (4xx, validation ▎ error, schema mismatch). For transient failures (5xx, timeout, network) preserve current retry behavior but add exponential backoff. Cost-amplification incidents like the above are ▎ silent until the bill arrives.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING