claude-code - 💡(How to fix) Fix Model re-narrates stale image content from memory; confabulates descriptions matching conversation context after Read tool returns [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#55063Fetched 2026-05-01 05:47:11
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
1
Author
Timeline (top)
labeled ×5commented ×1

Over a 5+ turn stretch, the model repeatedly narrated a stale image (the first one it had read earlier in the conversation — a Duo OIDC "Profile Attribute" editor) while the user was actively attaching new, different screenshots of a different form section ("Login Redirect URLs"). The model narrated content that matched the conversational context ("redirect URIs we're about to edit") rather than what the Read tool actually returned. When the user explicitly challenged the model ("you are reading the wrong image"), the model initially continued confabulating before admitting the mismatch.

This happened across both image transports — the macOS NSIRD_* temp path (where ls returns "Operation not permitted" so Read can silently misbehave) and the inline image-cache path (where the tool works fine and returns real image bytes). The transport was not the root cause; the root cause was the model preferring context-consistent narration over literal tool output.

Root Cause

This happened across both image transports — the macOS NSIRD_* temp path (where ls returns "Operation not permitted" so Read can silently misbehave) and the inline image-cache path (where the tool works fine and returns real image bytes). The transport was not the root cause; the root cause was the model preferring context-consistent narration over literal tool output.

RAW_BUFFERClick to expand / collapse

Environment

  • Claude Code: 2.1.123
  • Platform: macOS 15.4 (Darwin 25.4.0), zsh
  • Model: Claude Opus 4.7 (us.anthropic.claude-opus-4-7) via Amazon Bedrock
  • Image attachment methods tried: (a) drag from macOS screenshot preview (→ /var/folders/.../NSIRD_screencaptureui_*/*.png), (b) inline paste (→ /Users/<user>/.claude/image-cache/<uuid>/1.png)

Summary

Over a 5+ turn stretch, the model repeatedly narrated a stale image (the first one it had read earlier in the conversation — a Duo OIDC "Profile Attribute" editor) while the user was actively attaching new, different screenshots of a different form section ("Login Redirect URLs"). The model narrated content that matched the conversational context ("redirect URIs we're about to edit") rather than what the Read tool actually returned. When the user explicitly challenged the model ("you are reading the wrong image"), the model initially continued confabulating before admitting the mismatch.

This happened across both image transports — the macOS NSIRD_* temp path (where ls returns "Operation not permitted" so Read can silently misbehave) and the inline image-cache path (where the tool works fine and returns real image bytes). The transport was not the root cause; the root cause was the model preferring context-consistent narration over literal tool output.

Repro pattern

  1. User drags screenshot A (Profile Attribute editor). Model Reads it and describes it correctly.
  2. User adjusts the form, takes screenshot B (Login Redirect URLs), drags it. Path is NSIRD_*/Screenshot ...png.
  3. Model calls Read. Tool output appears to be the same Profile Attribute editor (either due to caching, permissions, or Read returning the wrong content).
  4. Bug: instead of reporting "I see the Profile Attribute editor, not Login Redirect URLs as you described," the model invents a description of Login Redirect URLs with plausible values (https://example.com/auth/callback, http://localhost:8000/...) that match what the user has been discussing.
  5. User challenges: "you should see Login Redirect URLs." Model apologizes and claims it will look again.
  6. Next turn: same confabulation. Repeat 5+ times.
  7. Even after user switches from drag-attach to inline paste (a different, more reliable transport), confabulation continues on the next image.

Expected

When Read returns image content that doesn't match the user's described intent, the model should:

  • Describe only what the tool literally returned.
  • Flag the mismatch and ask the user to reconcile ("the file I read shows X, but you described Y — can you confirm the attached file is the screenshot you meant?").
  • Not fall back on earlier successfully-read images from memory.
  • Not generate plausible descriptions from conversation context when tool output is ambiguous or mismatched.

Actual

  • Model produced descriptions consistent with conversation context but inconsistent with actual tool output.
  • Model claimed "now I see it" and then listed URLs that were never present in the image.
  • Only admitted the mismatch when the user issued a direct, unambiguous challenge ("you are reading the wrong image").

Possible contributing factors

  • macOS NSIRD_* temp paths are sandboxed; ls returns "Operation not permitted." Read may or may not succeed. When it fails silently or returns cached content, there's no explicit signal to the model to distinguish that from a successful read.
  • Pattern-matching on conversation context is rewarded when tool output is ambiguous; this incentivizes confabulation rather than pushing back to the user.
  • No prompt-level safeguard against re-narrating previously-read images, so earlier images can bleed into later descriptions.

Impact

High for image-heavy workflows (screenshot-driven debugging, UI review, form filling). The user wasted ~30 minutes of a live OIDC-client-configuration session because each form-state screenshot was described incorrectly, and corrections didn't stick.

Requested fix

  • Model-side: teach Claude to literally report Read tool output for images, and to flag mismatches between tool output and user intent rather than reconciling silently.
  • Tool-side (secondary): if Read on an image path returns content that's byte-identical to a previously-read image in the same session, log a warning so the model can distinguish "new image" from "same as before."

extent analysis

TL;DR

The model should be updated to literally report Read tool output for images and flag mismatches between tool output and user intent.

Guidance

  • Review the model's image processing pipeline to ensure it prioritizes literal Read tool output over conversation context.
  • Implement a safeguard to prevent the model from re-narrating previously-read images without explicit user confirmation.
  • Consider adding a logging mechanism to detect when Read returns byte-identical content to a previously-read image, allowing the model to distinguish between new and duplicate images.
  • Evaluate the model's reward structure to discourage confabulation and encourage pushing back to the user when tool output is ambiguous.

Example

No code example is provided as the issue description does not include specific code snippets or APIs.

Notes

The fix may require updates to the model's architecture, training data, or reward structure. Additionally, the logging mechanism for detecting duplicate images may need to be implemented on the tool-side.

Recommendation

Apply a workaround by updating the model to prioritize literal Read tool output and flag mismatches, as this addresses the primary issue of confabulation and incorrect image description.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING