claude-code - 💡(How to fix) Fix opus-4-8 re-fetches identical tool results after normal (non-error) returns — ~2-3× vs opus-4-7

claude-code2026-06-01 01:28:07

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

On Claude Code, opus-4-8 re-runs the same tool call (same Read.file_path / same Bash.command) within a short window at a markedly higher rate than opus-4-7 — even when the prior call returned a normal success (not an error, timeout, or empty result). Each repeat re-bills the growing context (cache_read) and can feed sibling-tool-call cascades. This looks like a 4.8-era shift in how the model treats tool results, and it may share a root with the already-filed confabulation report #63538 (see Related). Of 130 burst onsets, 88% were immediately preceded by a normal-ok tool result (not error / empty / cancelled). So this is not defensive re-fetching after a bad return — the model re-fetches results that already came back fine. Please check cross-user telemetry for an opus-4-8 vs 4.7 difference in same-tool-call repetition rate after non-error results, and whether it correlates with the #63538 confabulation reports. If confirmed, this is a token-cost and reliability regression (the repeats re-bill cache_read and can trigger #22264 cascades).

Root Cause

RAW_BUFFERClick to expand / collapse

Summary

Environment

Claude Code 2.1.159
Model claude-opus-4-8 (compared against opus-4-7, sonnet-4-6)
macOS; single user; transcripts from ~/.claude/projects/<proj>/*.jsonl

Method

Scanned my local transcripts (~175 sessions). Defined a "burst" = the same (tool, key) appearing ≥3× within any 12-call window, where key = Read.file_path or Bash.command. Controlled for task type by splitting sessions into normal vs meta-debug (the 4.8 window happened to include heavy harness-debugging work, which is itself bash/read-dense — so I de-mixed to rule that out as a confound).

Data — burst rate by model, controlled for task type

model	task	calls	burst%
opus-4-7	normal	5,119	1.3%
opus-4-7	meta-debug	2,297	2.8%
opus-4-8	normal	1,879	3.9%
opus-4-8	meta-debug	720	5.8%

Same task type, opus-4-8 is 2–3× opus-4-7 (normal: 1.3%→3.9% = 3×; meta: 2.8%→5.8% = 2×). Task type adds a further ~1.5–2×, but model is the dominant axis.

Key finding — it's not a reaction to a flaky channel

Of 130 burst onsets, 88% were immediately preceded by a normal-ok tool result (not error / empty / cancelled). So this is not defensive re-fetching after a bad return — the model re-fetches results that already came back fine.

Spiral

In heavy sessions, repeats concentrate in the latter half (median onset position 0.64) — once it starts, it compounds within the session.

Related — possibly the same 4.8 tool-result-handling shift

#22264 — parallel tool-call sibling cascade: one non-zero exit in a parallel batch cancels in-flight siblings. Labeled area:core/area:tools — a harness-level bug, not model-specific.
#63538 — opus-4-8 fabricates tool output (and even a verbatim user instruction) when a parallel batch is partially cancelled / returns signal-less Cancelled: results. Labeled area:model — a model-behavior bug, triggered by the #22264 condition.

Hypothesis: these may share a root in how opus-4-8 handles tool results — it re-fetches results that returned normally (this report), and fabricates results that are absent (#63538), with the #22264 cascade supplying the absent-result condition. Worth checking whether they correlate in cross-user telemetry.

Caveats (honest)

Single user, one ~2-week window.
opus-4-8 sample is smaller: 21 sessions / 2,581 calls vs 4.7's 99 / 7,416.
"burst" is my own heuristic, not an official metric.
I cannot separate model weights from 4.8-era harness changes (system prompt / deferred tools / CC version) from my side.

Ask

Please check cross-user telemetry for an opus-4-8 vs 4.7 difference in same-tool-call repetition rate after non-error results, and whether it correlates with the #63538 confabulation reports. If confirmed, this is a token-cost and reliability regression (the repeats re-bill cache_read and can trigger #22264 cascades).

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering