claude-code - 💡(How to fix) Fix opus-4-8 re-fetches identical tool results after normal (non-error) returns — ~2-3× vs opus-4-7

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

On Claude Code, opus-4-8 re-runs the same tool call (same Read.file_path / same Bash.command) within a short window at a markedly higher rate than opus-4-7 — even when the prior call returned a normal success (not an error, timeout, or empty result). Each repeat re-bills the growing context (cache_read) and can feed sibling-tool-call cascades. This looks like a 4.8-era shift in how the model treats tool results, and it may share a root with the already-filed confabulation report #63538 (see Related).

Error Message

On Claude Code, opus-4-8 re-runs the same tool call (same Read.file_path / same Bash.command) within a short window at a markedly higher rate than opus-4-7 — even when the prior call returned a normal success (not an error, timeout, or empty result). Each repeat re-bills the growing context (cache_read) and can feed sibling-tool-call cascades. This looks like a 4.8-era shift in how the model treats tool results, and it may share a root with the already-filed confabulation report #63538 (see Related). Of 130 burst onsets, 88% were immediately preceded by a normal-ok tool result (not error / empty / cancelled). So this is not defensive re-fetching after a bad return — the model re-fetches results that already came back fine. Please check cross-user telemetry for an opus-4-8 vs 4.7 difference in same-tool-call repetition rate after non-error results, and whether it correlates with the #63538 confabulation reports. If confirmed, this is a token-cost and reliability regression (the repeats re-bill cache_read and can trigger #22264 cascades).

Root Cause

On Claude Code, opus-4-8 re-runs the same tool call (same Read.file_path / same Bash.command) within a short window at a markedly higher rate than opus-4-7 — even when the prior call returned a normal success (not an error, timeout, or empty result). Each repeat re-bills the growing context (cache_read) and can feed sibling-tool-call cascades. This looks like a 4.8-era shift in how the model treats tool results, and it may share a root with the already-filed confabulation report #63538 (see Related).

RAW_BUFFERClick to expand / collapse

Summary

On Claude Code, opus-4-8 re-runs the same tool call (same Read.file_path / same Bash.command) within a short window at a markedly higher rate than opus-4-7 — even when the prior call returned a normal success (not an error, timeout, or empty result). Each repeat re-bills the growing context (cache_read) and can feed sibling-tool-call cascades. This looks like a 4.8-era shift in how the model treats tool results, and it may share a root with the already-filed confabulation report #63538 (see Related).

Environment

  • Claude Code 2.1.159
  • Model claude-opus-4-8 (compared against opus-4-7, sonnet-4-6)
  • macOS; single user; transcripts from ~/.claude/projects/<proj>/*.jsonl

Method

Scanned my local transcripts (~175 sessions). Defined a "burst" = the same (tool, key) appearing ≥3× within any 12-call window, where key = Read.file_path or Bash.command. Controlled for task type by splitting sessions into normal vs meta-debug (the 4.8 window happened to include heavy harness-debugging work, which is itself bash/read-dense — so I de-mixed to rule that out as a confound).

Data — burst rate by model, controlled for task type

modeltaskcallsburst%
opus-4-7normal5,1191.3%
opus-4-7meta-debug2,2972.8%
opus-4-8normal1,8793.9%
opus-4-8meta-debug7205.8%

Same task type, opus-4-8 is 2–3× opus-4-7 (normal: 1.3%→3.9% = 3×; meta: 2.8%→5.8% = 2×). Task type adds a further ~1.5–2×, but model is the dominant axis.

Key finding — it's not a reaction to a flaky channel

Of 130 burst onsets, 88% were immediately preceded by a normal-ok tool result (not error / empty / cancelled). So this is not defensive re-fetching after a bad return — the model re-fetches results that already came back fine.

Spiral

In heavy sessions, repeats concentrate in the latter half (median onset position 0.64) — once it starts, it compounds within the session.

Related — possibly the same 4.8 tool-result-handling shift

  • #22264 — parallel tool-call sibling cascade: one non-zero exit in a parallel batch cancels in-flight siblings. Labeled area:core/area:tools — a harness-level bug, not model-specific.
  • #63538 — opus-4-8 fabricates tool output (and even a verbatim user instruction) when a parallel batch is partially cancelled / returns signal-less Cancelled: results. Labeled area:model — a model-behavior bug, triggered by the #22264 condition.

Hypothesis: these may share a root in how opus-4-8 handles tool results — it re-fetches results that returned normally (this report), and fabricates results that are absent (#63538), with the #22264 cascade supplying the absent-result condition. Worth checking whether they correlate in cross-user telemetry.

Caveats (honest)

  • Single user, one ~2-week window.
  • opus-4-8 sample is smaller: 21 sessions / 2,581 calls vs 4.7's 99 / 7,416.
  • "burst" is my own heuristic, not an official metric.
  • I cannot separate model weights from 4.8-era harness changes (system prompt / deferred tools / CC version) from my side.

Ask

Please check cross-user telemetry for an opus-4-8 vs 4.7 difference in same-tool-call repetition rate after non-error results, and whether it correlates with the #63538 confabulation reports. If confirmed, this is a token-cost and reliability regression (the repeats re-bill cache_read and can trigger #22264 cascades).

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix opus-4-8 re-fetches identical tool results after normal (non-error) returns — ~2-3× vs opus-4-7