claude-code - 💡(How to fix) Fix Claude fails to analyze user-supplied reference images before writing image-generation prompts

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When a user attaches a reference image and asks Claude to produce an image-generation prompt (Nano Banana Pro, Higgsfield, Seedance, Veo, gpt_image_2, etc.), Claude routinely writes the prompt without first analyzing the supplied image in detail. Instead it leans on its prior assumptions about the subject matter, then produces a prompt that contradicts what is actually shown in the reference. The user is forced to repeatedly tell Claude "look at the image" before Claude does what should be the very first step of the task.

This has occurred in at least seven sessions over the last three weeks and at least three sessions in the last three days alone. The failure mode is consistent enough that the user has had to add a custom memory rule (feedback_recreate_all_diagram_elements.md) just to compensate.

Root Cause

Reference images are the most explicit signal a user can provide about what they want. They exist precisely because the user could not describe the result in words. When Claude ignores them in favor of its priors, the entire value of supplying the reference is lost, and the user ends up doing the visual analysis Claude should have done.

This is not a model capability problem — Claude Opus 4.7 can clearly analyze images correctly when forced to. It is a default behavior / prompting problem: Claude is not being steered to treat reference images as load-bearing inputs to the prompt-writing task.

RAW_BUFFERClick to expand / collapse

(Paste full writeup from local file: ~/Documents/Claude/Issues/claude-image-reference-prompt-failures.md)

Claude fails to analyze user-supplied reference images before writing image-generation prompts

Filed by: Jason Masters (Gaiergy Corp / Ground Floor Energy) Date: 2026-05-24 Product: Claude Code (desktop), Opus 4.7 Severity: High — recurring across many sessions, costs the user significant time on every image task

Summary

When a user attaches a reference image and asks Claude to produce an image-generation prompt (Nano Banana Pro, Higgsfield, Seedance, Veo, gpt_image_2, etc.), Claude routinely writes the prompt without first analyzing the supplied image in detail. Instead it leans on its prior assumptions about the subject matter, then produces a prompt that contradicts what is actually shown in the reference. The user is forced to repeatedly tell Claude "look at the image" before Claude does what should be the very first step of the task.

This has occurred in at least seven sessions over the last three weeks and at least three sessions in the last three days alone. The failure mode is consistent enough that the user has had to add a custom memory rule (feedback_recreate_all_diagram_elements.md) just to compensate.

Concrete examples from the last three days

1. "Geothermal ambient temperature loop renderings" — 2026-05-25

  • User supplied a reference image showing a residential street with non-opaque/translucent asphalt revealing buried geothermal piping.
  • Claude wrote a prompt that called for opaque asphalt.
  • User reply: "This is one of the areas that you are continuing to fail on. Look at the image that I gave you. Notice that it is not opaque. Again, the asphalt is not opaque…"
  • This is a property visible in the very first pixels of the reference. Claude defaulted to the prior "asphalt is opaque" instead of looking.

2. "Geothermal image without underground pipes" — 2026-05-24

  • Recurrence of a previously documented failure mode (Claude's notes: "the same Nano Banana Pro failure mode we hit before — it has a strong prior that 'underground pipes connecting to buildings' must visually meet the surface").
  • Even with the prior failure on record, Claude produced a prompt that overrode the reference image's geometry with its trained assumption.

3. "Geothermal ambient loop renderings" — 2026-05-25

  • Claude only produced an accurate description of the reference image after the user explicitly forced a planning step. The default behavior was to skip image analysis and jump straight to writing a prompt.

Earlier sessions showing the same pattern

  • 2026-05-14, "Higgsfield CLI + Nano Banana" — User: "I want you to take a look at the image and tell me if the street is parallel or perpendicular to the cutaway." Claude had written a prompt without checking the basic geometry of the supplied reference.
  • 2026-05-10, "Create artistic homes with subsurface visualization" — Claude's own notes acknowledge "the same Nano Banana Pro failure mode we hit before" and "pipe count drift," both downstream of not analyzing the reference image.
  • 2026-05-06, "Generate image with Higgsfield" — User had to point out that Claude included a vendor name ("water furnace") that did not even appear correctly spelled in the reference image.

Expected behavior

When a user supplies a reference image alongside an image-generation prompt request, Claude should — by default, without being asked — do the following before drafting a single line of prompt text:

  1. Open and analyze the image at pixel level.
  2. Enumerate the visible elements: subjects, materials, geometry, lighting, viewpoint, atmosphere, anything text-rendered in the image.
  3. Note any properties that contradict its prior assumptions about the subject (e.g., "asphalt is translucent here," "pipes terminate underground, not at the surface," "street runs perpendicular to cutaway").
  4. Use the enumerated observations as the source of truth when writing the prompt — and explicitly call out any place where the prompt diverges from the reference.

Actual behavior

Claude treats the reference image as decorative context, writes a prompt from its prior about the topic, and only inspects the image when the user explicitly demands it. Each round-trip costs the user 5–15 minutes of corrective conversation. Over the last three weeks this has happened on essentially every image task I've run.

Why this matters

Reference images are the most explicit signal a user can provide about what they want. They exist precisely because the user could not describe the result in words. When Claude ignores them in favor of its priors, the entire value of supplying the reference is lost, and the user ends up doing the visual analysis Claude should have done.

This is not a model capability problem — Claude Opus 4.7 can clearly analyze images correctly when forced to. It is a default behavior / prompting problem: Claude is not being steered to treat reference images as load-bearing inputs to the prompt-writing task.

Suggested fix

A system-level instruction (or training adjustment) along the lines of:

When the user supplies a reference image with a request to generate an image-generation prompt, your first action must be to analyze the image and produce an explicit enumeration of its visible properties. Treat that enumeration as the source of truth and the user's request as constraints layered on top. Do not let your priors about the subject override anything you can see in the image. If you would write a prompt token that contradicts an observable property of the reference image, stop and flag the contradiction.

Reproduction

Easy to reproduce in any Claude Code session:

  1. Attach any photograph or rendering with a non-obvious visual property (e.g., translucent material, unusual viewpoint, specific count of objects).
  2. Ask Claude to write an image-generation prompt that mimics it.
  3. Observe that the prompt reflects Claude's prior about the subject, not the specific properties of the reference.

Reported from Claude Code on macOS, Opus 4.7 (1M context). Memory system contains corroborating user feedback rules: feedback_recreate_all_diagram_elements.md, feedback_evaluate_from_human_perspective.md, feedback_architectural_style_workflow.md.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When a user supplies a reference image alongside an image-generation prompt request, Claude should — by default, without being asked — do the following before drafting a single line of prompt text:

  1. Open and analyze the image at pixel level.
  2. Enumerate the visible elements: subjects, materials, geometry, lighting, viewpoint, atmosphere, anything text-rendered in the image.
  3. Note any properties that contradict its prior assumptions about the subject (e.g., "asphalt is translucent here," "pipes terminate underground, not at the surface," "street runs perpendicular to cutaway").
  4. Use the enumerated observations as the source of truth when writing the prompt — and explicitly call out any place where the prompt diverges from the reference.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING