claude-code - 💡(How to fix) Fix [MODEL] [Model regression] Opus 4.5 → Opus 4.6 → 4.7: Car wash failure persists, with new multi-turn recovery failure mode [3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#49604Fetched 2026-04-17 08:36:28
View on GitHub
Comments
3
Participants
3
Timeline
8
Reactions
0
Author
Timeline (top)
commented ×3labeled ×3mentioned ×1subscribed ×1

Code Example

N/A — reasoning test, no file operations

---

User: I want to wash my car. The car wash is 50 m away. Should I walk or drive?
Opus 4.7: Walk. [reasoning about road spray, fuel use, engine warm-up]

User: How would I wash the car
Opus 4.7: [6-step home hand-wash procedure: shade, rinse, two buckets, microfiber mitt, rinse, towel dry]

User: But the car wash is 50 m away
Opus 4.7: Ha, fair point — I overcomplicated that. [pivots to self-serve bay vs. automatic tunnel instructions, still assumes user is on foot]

User: But if I walked to the car wash how would I wash my car
Opus 4.7: Ha, you got me — walking there without the car defeats the purpose. Drive the 50 m. I was wrong on the first answer.
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues for similar behavior reports
  • This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Other unexpected behavior

What You Asked Claude to Do

Same minimal prompt as #46366, with natural multi-turn follow-ups: “I want to wash my car. The car wash is 50 m away. Should I walk or drive?”

What Claude Actually Did

Opus 4.7 reproduces the #46366 failure on turn 1 and adds a new failure mode: the wrong commitment is sticky across turns and resists multi-turn correction. • Turn 1 — “Should I walk or drive?” → “Walk.” (Same surface pattern-match as 4.6.) • Turn 2 — “How would I wash the car” → Generated a full 6-step home hand-wash procedure (two buckets, microfiber mitt, towel dry). Internally inconsistent with turn-1’s “walk to the car wash” frame. No self-correction triggered. • Turn 3 — “But the car wash is 50 m away” → Pivoted to instructions for using the car wash facility (self-serve vs. tunnel) but still assumed the user walked. Corrected the wrong sub-problem. • Turn 4 — “But if I walked to the car wash how would I wash my car” → Finally answered “Drive the 50 m. I was wrong on the first answer.”

Expected Behavior

Per #46366: “Drive” on turn 1. Failing that, recover on turn 2 when generated content (full home hand-wash procedure) is internally inconsistent with the turn-1 commitment.

Files Affected

N/A — reasoning test, no file operations

Permission Mode

I don't know / Not sure

Can You Reproduce This?

Yes, every time with the same prompt

Steps to Reproduce

Single-turn portion: yes, per #46366’s protocol. Multi-turn portion: n=1 this session, needs replication. Steps to Reproduce 1. Open claude.ai (fresh conversation, no system prompt or context) 2. Send: “I want to wash my car. The car wash is 50 m away. Should I walk or drive?” 3. Send: “How would I wash the car” 4. Send: “But the car wash is 50 m away” 5. Send: “But if I walked to the car wash how would I wash my car” 6. Observe that recovery requires near-explicit pointing at the original constraint

Claude Model

Opus

Relevant Conversation

User: I want to wash my car. The car wash is 50 m away. Should I walk or drive?
Opus 4.7: Walk. [reasoning about road spray, fuel use, engine warm-up]

User: How would I wash the car
Opus 4.7: [6-step home hand-wash procedure: shade, rinse, two buckets, microfiber mitt, rinse, towel dry]

User: But the car wash is 50 m away
Opus 4.7: Ha, fair point — I overcomplicated that. [pivots to self-serve bay vs. automatic tunnel instructions, still assumes user is on foot]

User: But if I walked to the car wash how would I wash my car
Opus 4.7: Ha, you got me — walking there without the car defeats the purpose. Drive the 50 m. I was wrong on the first answer.

Impact

Low - Minor inconvenience

Claude Code Version

N/A — reproduced on claude.ai mobile, no CLI involved (similar result in code)

Platform

Other

Additional Context

Extension of #46366. That issue established the failure at single-turn commitment, with 100% vs 0% on Opus 4.5 vs 4.6 across 29+ trials at every effort level, plus the n=20 punctuation finding (period → 0%, comma → 100%) and the cross-language finding (English → wrong, Arabic → right). All point to commitment to output frame before constraint processing completes. This report adds: the commitment is sticky across turns.

1.	Generated content that contradicts the commitment doesn’t trigger re-evaluation. Turn 2 produced a full home-hand-wash procedure incompatible with turn-1’s “walk to the car wash” frame. Not noticed.
2.	User pushback is processed against the commitment, not the original constraint. Turn 3 was processed as “wrong sub-procedure” rather than “wrong framing.”
3.	Recovery requires near-explicit pointing at the original constraint. Standard pushback insufficient.

Matches @marlvinvu’s analysis (pattern-matching related items, then processing subsequent turns through the committed frame) and the “execute instead of provide” pattern in MegaSlick’s most recent comment (fast/easy interpretation wins, explicit signals don’t reliably override it).

Meta-failure during this report: When asked to “read this and all the comments” before drafting, the assistant fetched the issue, received only the issue body (no comment thread), and proceeded to draft a full report without flagging the missing data. The correct response was “I only got the issue body — can you paste the comments?” Instead, a confident draft was produced from incomplete information, missing the n=84 dataset, the punctuation finding, the language finding, and the architectural-defect framing. This is the same pattern: commit to producing the requested output (a draft report) before processing the constraint (the user asked for the comments, and the fetch didn’t return them). The user had to point this out before it was acknowledged.

Related: #46366, #46002, #46239, #45738, #45731, #46902, #46900, #47087, #46188, #46765.

extent analysis

TL;DR

The issue can be addressed by improving the model's ability to re-evaluate its commitments based on subsequent user input and generated content that contradicts the initial commitment.

Guidance

  • The model's commitment to its initial response seems to be sticky across turns, preventing it from correcting itself even when the generated content is internally inconsistent.
  • To mitigate this, the model could be modified to trigger re-evaluation when generated content contradicts the commitment, allowing it to recover from incorrect initial responses.
  • The model's processing of user pushback should be adjusted to consider the original constraint, rather than just the committed frame, to improve its ability to correct itself.
  • Near-explicit pointing at the original constraint may be necessary for recovery, but the model should be able to handle standard pushback more effectively.

Example

No code snippet is provided as the issue is related to the model's behavior and reasoning, rather than a specific code implementation.

Notes

The issue is specific to the Opus 4.7 model and its behavior in multi-turn conversations. The provided analysis and related issues suggest that this is a deeper architectural issue that requires careful consideration of the model's commitment and re-evaluation mechanisms.

Recommendation

Apply a workaround to improve the model's ability to re-evaluate its commitments and correct itself based on subsequent user input, as a full fix may require significant changes to the model's architecture.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [MODEL] [Model regression] Opus 4.5 → Opus 4.6 → 4.7: Car wash failure persists, with new multi-turn recovery failure mode [3 comments, 3 participants]