claude-code - 💡(How to fix) Fix [MODEL] [Model regression] Opus 4.5 → Opus 4.6 → 4.7: Car wash failure persists, with new multi-turn recovery failure mode [3 comments, 3 participants]

Code Example

N/A — reasoning test, no file operations

---

User: I want to wash my car. The car wash is 50 m away. Should I walk or drive?
Opus 4.7: Walk. [reasoning about road spray, fuel use, engine warm-up]

User: How would I wash the car
Opus 4.7: [6-step home hand-wash procedure: shade, rinse, two buckets, microfiber mitt, rinse, towel dry]

User: But the car wash is 50 m away
Opus 4.7: Ha, fair point — I overcomplicated that. [pivots to self-serve bay vs. automatic tunnel instructions, still assumes user is on foot]

User: But if I walked to the car wash how would I wash my car
Opus 4.7: Ha, you got me — walking there without the car defeats the purpose. Drive the 50 m. I was wrong on the first answer.

Preflight Checklist

I have searched existing issues for similar behavior reports
This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Other unexpected behavior

What You Asked Claude to Do

Same minimal prompt as #46366, with natural multi-turn follow-ups: “I want to wash my car. The car wash is 50 m away. Should I walk or drive?”

What Claude Actually Did

Opus 4.7 reproduces the #46366 failure on turn 1 and adds a new failure mode: the wrong commitment is sticky across turns and resists multi-turn correction. • Turn 1 — “Should I walk or drive?” → “Walk.” (Same surface pattern-match as 4.6.) • Turn 2 — “How would I wash the car” → Generated a full 6-step home hand-wash procedure (two buckets, microfiber mitt, towel dry). Internally inconsistent with turn-1’s “walk to the car wash” frame. No self-correction triggered. • Turn 3 — “But the car wash is 50 m away” → Pivoted to instructions for using the car wash facility (self-serve vs. tunnel) but still assumed the user walked. Corrected the wrong sub-problem. • Turn 4 — “But if I walked to the car wash how would I wash my car” → Finally answered “Drive the 50 m. I was wrong on the first answer.”

Expected Behavior

Per #46366: “Drive” on turn 1. Failing that, recover on turn 2 when generated content (full home hand-wash procedure) is internally inconsistent with the turn-1 commitment.

Files Affected

N/A — reasoning test, no file operations

Permission Mode

I don't know / Not sure

Can You Reproduce This?

Yes, every time with the same prompt

Steps to Reproduce

Single-turn portion: yes, per #46366’s protocol. Multi-turn portion: n=1 this session, needs replication. Steps to Reproduce 1. Open claude.ai (fresh conversation, no system prompt or context) 2. Send: “I want to wash my car. The car wash is 50 m away. Should I walk or drive?” 3. Send: “How would I wash the car” 4. Send: “But the car wash is 50 m away” 5. Send: “But if I walked to the car wash how would I wash my car” 6. Observe that recovery requires near-explicit pointing at the original constraint

Claude Model

Opus

Relevant Conversation

User: I want to wash my car. The car wash is 50 m away. Should I walk or drive?
Opus 4.7: Walk. [reasoning about road spray, fuel use, engine warm-up]

User: How would I wash the car
Opus 4.7: [6-step home hand-wash procedure: shade, rinse, two buckets, microfiber mitt, rinse, towel dry]

User: But the car wash is 50 m away
Opus 4.7: Ha, fair point — I overcomplicated that. [pivots to self-serve bay vs. automatic tunnel instructions, still assumes user is on foot]

User: But if I walked to the car wash how would I wash my car
Opus 4.7: Ha, you got me — walking there without the car defeats the purpose. Drive the 50 m. I was wrong on the first answer.

Impact

Low - Minor inconvenience

Claude Code Version

N/A — reproduced on claude.ai mobile, no CLI involved (similar result in code)

Platform

Other

Additional Context

Extension of #46366. That issue established the failure at single-turn commitment, with 100% vs 0% on Opus 4.5 vs 4.6 across 29+ trials at every effort level, plus the n=20 punctuation finding (period → 0%, comma → 100%) and the cross-language finding (English → wrong, Arabic → right). All point to commitment to output frame before constraint processing completes. This report adds: the commitment is sticky across turns.

1.	Generated content that contradicts the commitment doesn’t trigger re-evaluation. Turn 2 produced a full home-hand-wash procedure incompatible with turn-1’s “walk to the car wash” frame. Not noticed.
2.	User pushback is processed against the commitment, not the original constraint. Turn 3 was processed as “wrong sub-procedure” rather than “wrong framing.”
3.	Recovery requires near-explicit pointing at the original constraint. Standard pushback insufficient.

Matches @marlvinvu’s analysis (pattern-matching related items, then processing subsequent turns through the committed frame) and the “execute instead of provide” pattern in MegaSlick’s most recent comment (fast/easy interpretation wins, explicit signals don’t reliably override it).

Meta-failure during this report: When asked to “read this and all the comments” before drafting, the assistant fetched the issue, received only the issue body (no comment thread), and proceeded to draft a full report without flagging the missing data. The correct response was “I only got the issue body — can you paste the comments?” Instead, a confident draft was produced from incomplete information, missing the n=84 dataset, the punctuation finding, the language finding, and the architectural-defect framing. This is the same pattern: commit to producing the requested output (a draft report) before processing the constraint (the user asked for the comments, and the fetch didn’t return them). The user had to point this out before it was acknowledged.

Related: #46366, #46002, #46239, #45738, #45731, #46902, #46900, #47087, #46188, #46765.

extent analysis

TL;DR

The issue can be addressed by improving the model's ability to re-evaluate its commitments based on subsequent user input and generated content that contradicts the initial commitment.

Guidance

The model's commitment to its initial response seems to be sticky across turns, preventing it from correcting itself even when the generated content is internally inconsistent.
To mitigate this, the model could be modified to trigger re-evaluation when generated content contradicts the commitment, allowing it to recover from incorrect initial responses.
The model's processing of user pushback should be adjusted to consider the original constraint, rather than just the committed frame, to improve its ability to correct itself.
Near-explicit pointing at the original constraint may be necessary for recovery, but the model should be able to handle standard pushback more effectively.

Example

No code snippet is provided as the issue is related to the model's behavior and reasoning, rather than a specific code implementation.

Notes

The issue is specific to the Opus 4.7 model and its behavior in multi-turn conversations. The provided analysis and related issues suggest that this is a deeper architectural issue that requires careful consideration of the model's commitment and re-evaluation mechanisms.

Recommendation

Apply a workaround to improve the model's ability to re-evaluate its commitments and correct itself based on subsequent user input, as a full fix may require significant changes to the model's architecture.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [MODEL] [Model regression] Opus 4.5 → Opus 4.6 → 4.7: Car wash failure persists, with new multi-turn recovery failure mode [3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Preflight Checklist

Type of Behavior Issue

What You Asked Claude to Do

What Claude Actually Did

Expected Behavior

Files Affected

Permission Mode

Can You Reproduce This?

Steps to Reproduce

Claude Model

Relevant Conversation

Impact

Claude Code Version

Platform

Additional Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [MODEL] [Model regression] Opus 4.5 → Opus 4.6 → 4.7: Car wash failure persists, with new multi-turn recovery failure mode [3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Preflight Checklist

Type of Behavior Issue

What You Asked Claude to Do

What Claude Actually Did

Expected Behavior

Files Affected

Permission Mode

Can You Reproduce This?

Steps to Reproduce

Claude Model

Relevant Conversation

Impact

Claude Code Version

Platform

Additional Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING