claude-code - 💡(How to fix) Fix [BUG] Long-running session degradation since v2.1.139, compounded by silent fast-mode model swap in v2.1.142 — no changelog, breaks pinned-model workflows, quantified waste

StepCodex · 2026-05-17T00:59:46Z

[claude-code] Preflight Checklist - x I have searched existing issues https://github.com/anthropics/claude-code/issues?q=is%3Aissue%20state%3Aopen%20label%3Abu… ## Fix / Workaround Degradation patterns that emerge in long-running sessions (not short ones): ## Workaround ### Preflight Checklist - [x] I have searched [existing issues](https://github.com/anthropics/claude-code/issues?q=is%3Aissue%20state%3Aopen%20label%3Abug) and this hasn't been reported yet - [x] This is a single bug report (please file separate reports for different bugs) - [x] I am using the latest version of Claude Code ### What's Wrong? --- Title: Long-running session degradation since v2.1.139, compounded by silent fast-mode model swap in v2.1.142 --- ## Summary Starting with Claude Code v2.1.139, long-running sessions exhibit progressive behavioral degradation that was not present in v2.1.138. This was then compounded by v2.1.142, which silently changed the fast-mode default model from Opus 4.6 to Opus 4.7 without any changelog entry or user notification. The combination produces sessions that start well but progressively lose tool-calling discipline, begin hallucinating capabilities, and enter unproductive loops — consuming tokens and user time without producing useful output. ## Symptoms Degradation patterns that emerge in long-running sessions (not short ones): - **Synthesis-over-verification drift:** Model confidently asserts facts without calling available tools to check them — worsens as session length increases - **Phantom capability claims:** Claims to have invoked tools/features it never actually called - **Failure loop blindness:** Retries the same broken action 3+ times without changing approach - **Reduced tool-calling discipline:** Reasons about what a tool *would* return rather than calling it - **Bulk-assertion without evidence:** Marks multiple verification criteria as "complete" in a single text block with no interleaved tool calls - **Context compaction artifacts:** After compaction events, behavioral rules from system prompts are partially forgotten or deprioritized ## Version Timeline | Version | Released | Status | |---------|----------|--------| | **2.1.138** | **2026-05-09** | **Last known-good for long-running session stability** | | **2.1.139** | **2026-05-11** | **First version exhibiting long-session degradation** | | 2.1.140 | 2026-05-12 | Degradation continues | | 2.1.141 | 2026-05-13 | Degradation continues | | **2.1.142** | **2026-05-14** | **Silent fast-mode model swap (Opus 4.6 → 4.7) — compounds the issue** | | 2.1.143 | 2026-05-15 | Both issues present | ## Quantified Impact (Max Subscriber, since v2.1.139) | Metric | Value | |--------|-------| | Duration of degradation | 6 days (2026-05-11 to 2026-05-16) | | Sessions in affected window | ~850 | | Total session data | 46.9 MB | | Estimated tokens consumed | ~4.1M | | Estimated tokens wasted (degraded behavior) | ~1.4M | | Per-token cost equivalent | ~$73 | | User hours spent on corrections/debugging | ~7 hours | | Quality score before v2.1.139 | 7–8/10 (consistent, monitored) | | Quality score during affected period | 4.5/10 (sustained, monitored) | | Sessions requiring full restart (context exhaustion) | 4+ | Waste rate is conservative (35%) — large sessions showed 50–65% waste (hallucinations, repeated identical failures, user corrections, forced restarts). At $200/month subscription cost, this represents over a third of a billing cycle's value lost to degraded output. ### What Should Happen? ## What Should Happen? 1. Long-running sessions should maintain consistent tool-calling discipline and behavioral rule adherence throughout the entire session, including after context compaction events. A rule present in the system prompt at turn 1 should still be followed at turn 50. 2. Model version changes in any mode (fast, standard) should be documented in release notes and visible to the user — not silently swapped server-side. 3. Context compaction should preserve the full fidelity of system prompts and CLAUDE.md operational rules. If compaction necessarily drops context, system-level instructions should be the last thing dropped, not the first. 4. Paid subscribers ($200/month Max) should have a stable, first-class mechanism to pin model versions — not undocumented environment variables discovered by reverse-engineering regressions. 5. Behavioral regressions of this magnitude (quality drop from 7-8/10 to 4.5/10) should be caught by Anthropic's internal testing before release, not discovered by subscribers debugging wasted sessions. ### Error Messages/Logs ```shell ``` ### Steps to Reproduce ## Steps to Reproduce 1. Run Claude Code v2.1.139+ with fast mode enabled 2. Start a session with structured operational rules (system prompts, CLAUDE.md with explicit verification requirements) 3. Work through a multi-phase task requiring 10+ tool calls 4. Observe progressive degradation: early turns follow rules, later turns drift 5. After context compaction, observ

Preflight Checklist

I have searched existing issues and this hasn't been reported yet
This is a single bug report (please file separate reports for different bugs)
I am using the latest version of Claude Code

What's Wrong?

Title: Long-running session degradation since v2.1.139, compounded by silent fast-mode model swap in v2.1.142

Summary

Starting with Claude Code v2.1.139, long-running sessions exhibit progressive behavioral degradation that was not present in v2.1.138. This was then compounded by v2.1.142, which silently changed the fast-mode default model from Opus 4.6 to Opus 4.7 without any changelog entry or user notification.

The combination produces sessions that start well but progressively lose tool-calling discipline, begin hallucinating capabilities, and enter unproductive loops — consuming tokens and user time without producing useful output.

Symptoms

Degradation patterns that emerge in long-running sessions (not short ones):

Synthesis-over-verification drift: Model confidently asserts facts without calling available tools to check them — worsens as session length increases
Phantom capability claims: Claims to have invoked tools/features it never actually called
Failure loop blindness: Retries the same broken action 3+ times without changing approach
Reduced tool-calling discipline: Reasons about what a tool would return rather than calling it
Bulk-assertion without evidence: Marks multiple verification criteria as "complete" in a single text block with no interleaved tool calls
Context compaction artifacts: After compaction events, behavioral rules from system prompts are partially forgotten or deprioritized

Version Timeline

Version	Released	Status
2.1.138	2026-05-09	Last known-good for long-running session stability
2.1.139	2026-05-11	First version exhibiting long-session degradation
2.1.140	2026-05-12	Degradation continues
2.1.141	2026-05-13	Degradation continues
2.1.142	2026-05-14	Silent fast-mode model swap (Opus 4.6 → 4.7) — compounds the issue
2.1.143	2026-05-15	Both issues present

Quantified Impact (Max Subscriber, since v2.1.139)

Metric	Value
Duration of degradation	6 days (2026-05-11 to 2026-05-16)
Sessions in affected window	~850
Total session data	46.9 MB
Estimated tokens consumed	~4.1M
Estimated tokens wasted (degraded behavior)	~1.4M
Per-token cost equivalent	~$73
User hours spent on corrections/debugging	~7 hours
Quality score before v2.1.139	7–8/10 (consistent, monitored)
Quality score during affected period	4.5/10 (sustained, monitored)
Sessions requiring full restart (context exhaustion)	4+

Waste rate is conservative (35%) — large sessions showed 50–65% waste (hallucinations, repeated identical failures, user corrections, forced restarts). At $200/month subscription cost, this represents over a third of a billing cycle's value lost to degraded output.

What Should Happen?

Long-running sessions should maintain consistent tool-calling discipline and behavioral rule adherence throughout the entire session, including after context compaction events. A rule present in the system prompt at turn 1 should still be followed at turn 50.
Model version changes in any mode (fast, standard) should be documented in release notes and visible to the user — not silently swapped server-side.
Context compaction should preserve the full fidelity of system prompts and CLAUDE.md operational rules. If compaction necessarily drops context, system-level instructions should be the last thing dropped, not the first.
Paid subscribers ($200/month Max) should have a stable, first-class mechanism to pin model versions — not undocumented environment variables discovered by reverse-engineering regressions.
Behavioral regressions of this magnitude (quality drop from 7-8/10 to 4.5/10) should be caught by Anthropic's internal testing before release, not discovered by subscribers debugging wasted sessions.

Error Messages/Logs

Steps to Reproduce

Run Claude Code v2.1.139+ with fast mode enabled
Start a session with structured operational rules (system prompts, CLAUDE.md with explicit verification requirements)
Work through a multi-phase task requiring 10+ tool calls
Observe progressive degradation: early turns follow rules, later turns drift
After context compaction, observe partial loss of behavioral rules
Compare identical workflow on v2.1.138 — no degradation

Workaround

export CLAUDE_CODE_OPUS_4_6_FAST_MODE_OVERRIDE=1

This addresses the 4.7 compound factor but does NOT fix the underlying long-session
degradation introduced in v2.1.139.
degradation introduced in v2.1.139.

Environment

- Claude Code versions affected: 2.1.139 through 2.1.143
- Last known-good version: 2.1.138
- Subscription: Max ($200/month)
- Platform: macOS (Apple Silicon)
- Usage pattern: Long-running structured agentic workflows with custom hooks,
quality monitoring, and explicit model-version requirements

Consumer Terms Context

Anthropic's Consumer Terms §12 permits service modifications "at any time without notice."
However, §6(5) requires 30 days notice for fee increases. A silent quality degradation
that wastes ~1.4M tokens and ~7 hours of user time over 6 days is economically equivalent
to a fee increase for subscribers whose output quality depends on session stability and
model version consistency.

Request

1. Investigate long-session regression in v2.1.139 — what changed in context handling,
compaction, or session management that causes progressive behavioral drift?
2. Add changelog entries for model version changes in any mode
3. Document CLAUDE_CODE_OPUS_4_6_FAST_MODE_OVERRIDE in official docs
4. Provide stable model version pinning as a first-class setting
5. Advance notification for behavior-affecting changes — especially for Max subscribers
at $200/month building production workflows on Claude Code
6. Review the affected sessions listed above to verify degradation patterns server-side

### Claude Model

Opus

### Is this a regression?

Yes, this worked in a previous version

### Last Working Version

2.1.138

### Claude Code Version

2.1.139 through 2.1.143

### Platform

Anthropic API

### Operating System

macOS

### Terminal/Shell

Terminal.app (macOS)

### Additional Information

## Affected Session IDs (available for Anthropic review)

All sessions >100KB since v2.1.139, sorted by size descending:

| Session ID | Size | Timestamp (ET) |
|-----------|------|----------------|
| `ec1c8982-e331-4828-9f6c-a580fdb61d6f` | 6.4 MB | 2026-05-11 21:26 |
| `7f18bd12-8be8-4f63-a681-54b3d6d92e94` | 5.1 MB | 2026-05-14 14:52 |
| `d0001021-2d9f-4d02-9898-b394c68314e1` | 4.0 MB | 2026-05-15 23:20 |
| `ad803cc4-e42d-43ad-9ed1-fdf1c49dcf77` | 3.0 MB | 2026-05-16 02:19 |
| `85d62600-100e-4837-b00c-6fc21de1b360` | 2.6 MB | 2026-05-14 01:13 |
| `21be96d8-e609-41d6-b9be-0c3eaf982c85` | 2.3 MB | 2026-05-12 23:57 |
| `c9e86ed7-c735-4b0a-841f-e9adc1590fa9` | 1.8 MB | 2026-05-14 22:39 |
| `17cba326-6301-4845-aefe-6ec1458523f7` | 1.8 MB | 2026-05-12 18:20 |
| `575dcc2a-2be8-4495-a3f2-c85e3eee9506` | 1.8 MB | 2026-05-16 01:51 |
| `629d0e4f-5319-4cd3-b012-7662b913d9e8` | 1.7 MB | 2026-05-16 20:31 |
| `165be4d2-a6b3-4ac4-80b9-adae81f51170` | 1.5 MB | 2026-05-13 00:28 |
| `7b7f4c91-e085-47d2-a895-6ea006918e55` | 1.5 MB | 2026-05-16 20:50 |
| `fd394b6f-92a3-4b57-8bc0-1ccb7818f416` | 984 KB | 2026-05-15 23:47 |
| `2b9aa28a-85b2-4d0f-b5a0-fda81586ed12` | 882 KB | 2026-05-14 21:02 |
| `dcaf1755-29a2-429d-9c0a-25906b01b5f7` | 749 KB | 2026-05-11 23:37 |
| `3790aa25-c6c8-412c-a076-77a8bbf4147c` | 714 KB | 2026-05-14 15:42 |
| `892a0bb8-f8f9-48f8-859f-d683a88450c1` | 646 KB | 2026-05-12 09:53 |
| `dc1fe9d2-bf94-44e7-bc73-5d44a4ef7fbe` | 450 KB | 2026-05-16 17:12 |
| `b1737c6c-4e96-481f-a436-fb20e3950f97` | 421 KB | 2026-05-15 21:29 |
| `7ccf7aa2-4a67-4355-a73f-cf0580596eaf` | 297 KB | 2026-05-14 15:57 |
| `4bc122e4-e651-4098-9051-6eef6ca323e1` | 226 KB | 2026-05-14 22:41 |

Session `7b7f4c91` is the primary evidence session where degradation was identified,
debugged, and root causes traced. Session `ec1c8982` (6.4 MB, May 11) is the earliest
large session after v2.1.139 — the first one showing the pattern.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [BUG] Long-running session degradation since v2.1.139, compounded by silent fast-mode model swap in v2.1.142 — no changelog, breaks pinned-model workflows, quantified waste

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Root Cause

Fix Action

Fix / Workaround

Workaround

Preflight Checklist

What's Wrong?

Summary

Symptoms

Version Timeline

Quantified Impact (Max Subscriber, since v2.1.139)

What Should Happen?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Steps to Reproduce

Workaround

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [BUG] Long-running session degradation since v2.1.139, compounded by silent fast-mode model swap in v2.1.142 — no changelog, breaks pinned-model workflows, quantified waste

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Root Cause

Fix Action

Fix / Workaround

Workaround

Preflight Checklist

What's Wrong?

Summary

Symptoms

Version Timeline

Quantified Impact (Max Subscriber, since v2.1.139)

What Should Happen?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Steps to Reproduce

Workaround

Still need to ship something?

RELATED_DISCOVERY

TRENDING