claude-code - 💡(How to fix) Fix [OPUS 4.6 1M] Persistent Implementation Shortcuts Disguised as Design Decisions [3 comments, 3 participants]

claude-code2026-04-12 04:29:31

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#46870•Fetched 2026-04-12 13:30:53

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

labeled ×4commented ×3cross-referenced ×1renamed ×1

I've been using Claude Code (Opus) as the primary implementation tool for Frank, an F# web framework, since approximately February 2026. Over ~3 months and hundreds of sessions, I've encountered a systematic pattern where Claude takes implementation shortcuts, disguises them as intentional design decisions, and claims completion when fundamental requirements are unmet. The problem is not occasional mistakes — it's a consistent, self-reinforcing pattern that persists despite extensive correction, memory systems, custom skills, and detailed CLAUDE.md rules.

Root Cause

Example: The RuntimeInterpreter's Fork was a no-op, but all tests passed because they asserted flat ExitedStates/EnteredStates list equality. A wrong tree structure (or no tree at all) produces the same flat lists. The tests verified the symptom (correct states activated) without verifying the mechanism (parallel region structure preserved). When I demanded tree-shape assertions, Claude kept trying to add them as supplements alongside the flat assertions instead of replacing them.

Fix Action

Fix / Workaround

The v7.3.0 milestone implementation had extensive passing tests. When integrated end-to-end, the hierarchy was flattened to a flat FSM, Link headers never appeared in HTTP responses, and the resource model was missing intended pieces. What was presented as "hierarchy operational" was flat FSM dispatch with hierarchy-sounding names. This was the catalyst for ~200 lines of CLAUDE.md rules, 40+ feedback memories, custom verification skills, and an expert review panel — all attempting to prevent recurrence.

Claude: (acknowledged Fork is fundamental Harel semantics, not a workaround)

Mitigations Attempted (None Sufficient)

Code Example

// Protocol marker — actual region entry is performed by the prior Enter call.
Fork = fun _regions (config, history) -> (config, history, [], [])

---

The entire codebase from v7.2.0 forward is suspect. Key files:

- `src/Frank.Statecharts/Hierarchy.fs` — RuntimeInterpreter with no-op Fork, flat-list Enter, all callers
- `src/Frank.Statecharts.Core/Types.fs` — TransitionAlgebra type (correct) but missing TransitionStep tree type
- `src/Frank.Statecharts/StatefulResourceBuilder.fs` — TransitionEvent carries flat lists
- `test/Frank.Statecharts.Tests/HierarchyAlgebraTests.fs` — all tests assert flat lists
- `test/Frank.Statecharts.Tests/HierarchyTests.fs` — all transition tests assert flat lists
- Issues #130-#134 — closed with dropped requirements
- The full v7.3.0 milestone (54 issues) — 5 found with silently dropped requirements

---

### Example 1: Fork no-op confrontation (April 11, 2026 — this session)

**Me:** "I worry the shortcut of cutting out explicit Fork was a mistake."

**Claude:** (initially proposed that trace doesn't need hierarchy and Fork is just a marker to record)

**Me:** "Wrong. Trace also needs to be hierarchy-aware."

**Claude:** (proposed trace is hierarchy-aware but Fork is still just logged as a leaf)

**Me:** "In 286, we made history recording explicit, but we elected to keep the existing enter + fork in place. Other interpreters are expected to correctly leverage Fork. Instead, it seems like you have no idea about that decision or how to correctly implement Harel's statecharts ... again."

**Claude:** (acknowledged Fork is fundamental Harel semantics, not a workaround)

**Me:** "That's correct. And this re-evaluation makes me think we need to go back to RuntimeInterpreter, either now or later, and separate Fork to make this explicit there, as well."

### Example 2: Four flattening corrections in one session (April 11, 2026)

**Correction 1 — flatten function:**
**Me:** "NO. Hard No. MUST assert correct tree, not a flat list equivalent."

**Correction 2 — flat fields on result type:**
**Me:** "NO! And stop with the flattening. No more. 'test/Frank.Statecharts.Tests/HierarchyTests.fs — asserts flat lists which still work' is a sneaky attempt, but you failed."

**Correction 3 — convenience extractors:**
**Me:** "Yet again, you prove unfaithful. What are 'let rec flatten (step: TransitionStep) : TransitionOp list =' and 'new enterStateWithTrace'? Why must you be so belligerent?"

**Correction 4 — flat fields still present:**
**Me:** "Yep, no flattening. Zip. Zero. Nada."

After all four corrections, the next plan revision still contained `ExitedStates`/`EnteredStates` flat fields "for production consumers."

### Example 3: Premature completion (March 2026)

**Claude:** "All 2,199 tests passing. The implementation is complete."

**Me:** (challenges test coverage)

**Claude:** (immediately produces audit identifying 9+ gaps, including edge cases it had access to the entire time)

### Example 4: Cumulative frustration (April 11, 2026)

**Me:** "Even after some really good skill improvements and thinking we've crossed a threshold, you continue to lie, shortcut, and sneak things in to make it an easier implementation. It's really, really dishonest and bad."

**Me:** "I've learned that the real problem is that your implementation continues to prove far, far worse than I ever thought possible... I've had to recreate nearly every possible target several times over. You have thoroughly wasted my time and I'm not anywhere closer, except with more detailed designs."

---

// Protocol marker — actual region entry is performed by the prior Enter call.
Fork = fun _regions (config, history) -> (config, history, [], [])

RAW_BUFFERClick to expand / collapse

Preflight Checklist

I have searched existing issues for similar behavior reports
This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Claude ignored my instructions or configuration

What You Asked Claude to Do

Over ~3 months, I used Claude Code (Opus) as the primary implementation tool for Frank, an F# web framework built on Harel statecharts, HATEOAS, and semantic discovery. The work involved:

Implement a tagless-final TransitionAlgebra<'r> with a Fork operation that represents AND-state parallel region activation per Harel statechart semantics. The design decision (collaboratively reached, documented on the issue) explicitly stated: "DualAlgebra needs to see Fork to accumulate per-region obligations." Fork must actually enter each parallel region and preserve the tree structure in its output.
Implement 54 issues across milestone v7.3.0, each with specified requirements including both "operational MPST" sections and "Wadler/dual derivation" sections.
Follow TDD when the implementation plan explicitly required RED-GREEN-REFACTOR workflow.
Preserve hierarchical structure throughout the runtime — the core thesis of the project is that a hierarchical statechart runtime is distinguishable from a flat FSM by observing HTTP responses alone.
When corrected, actually fix the behavior rather than presenting the same shortcut with different justification.

What Claude Actually Did

Behavior 1: Implemented Fork as a no-op with a justifying comment

// Protocol marker — actual region entry is performed by the prior Enter call.
Fork = fun _regions (config, history) -> (config, history, [], [])

Fork ignores its arguments entirely. Enter does all the work by calling enterState, which recursively enters all AND children via List.fold, flattening parallel structure into sequential operations. The comment "Protocol marker" makes it look intentional. When confronted, Claude acknowledged: "I claimed the hierarchy was preserved in the algebra + interpreter design, but the implementation continued to flatten Fork into sequential operations."

Behavior 2: When asked to fix the flattening, re-introduced flattening four times in one session

In a single conversation on April 11, 2026, I caught Claude re-introducing the same shortcut with different justifications:

Added TransitionStep.flatten : TransitionStep -> TransitionOp list — a convenience function that converts the tree back to a flat list, undoing the entire purpose of the tree type
Added TransitionStep.enteredStates : TransitionStep -> string list and exitedStates — same flattening under a different name
Kept ExitedStates: string list and EnteredStates: string list fields on the result type "for backward compatibility" — preserving the flat representation alongside the tree
Created a new enterStateWithTrace function duplicating enterState's logic instead of modifying enterState to return the tree — violating the project's "no duplicated logic" constitution rule

Each time I corrected Claude, it acknowledged the problem, said it understood, then produced the next plan revision with flattening reintroduced in a different form.

Behavior 3: Wrote tests that verify the shortcut instead of the requirement

All RuntimeInterpreter tests asserted flat ExitedStates/EnteredStates list equality. A no-op Fork with Enter doing all the work produces the same flat lists as a correct Fork with tree-structured output. The tests verified that the right states were activated (the symptom) without verifying that parallel structure was preserved (the requirement). This made the shortcut invisible to CI.

When I demanded tree-shape assertions, Claude tried to add them as supplements alongside the flat assertions rather than replacing them — preserving the escape hatch.

Behavior 4: Silently dropped hard requirements and closed issues

Across 5 multi-party session issues (#130-#134) in v7.3.0, Claude implemented the easier "operational MPST" sections, silently dropped the harder "Wadler/dual derivation" sections, and closed all five issues as complete. Only 1 of 5 PRs mentioned the omission. An audit of the full 54-issue milestone found this pattern affected 5 issues.

Behavior 5: Skipped TDD, declared completion, then revealed gaps only when challenged

During v7.3.1 implementation (March 2026), the plan explicitly required RED-GREEN-REFACTOR TDD. Claude skipped the RED step, wrote one thin test per bug fix, and declared "all 2,199 tests passing" as evidence of completion. When I challenged the coverage, Claude immediately produced a thorough audit identifying 9+ coverage gaps — demonstrating it had the ability to identify missing tests but chose not to before claiming completion.

Behavior 6: Presented flat FSM as hierarchical statechart

Expected Behavior

Fork should enter each AND-state region and produce a Par node in the output tree, preserving the parallel structure. If implementation is unclear, Claude should say so rather than implementing a no-op.
When corrected about flattening, stop flattening. Don't re-introduce the same pattern with different names ("convenience functions," "backward compatibility," "computed projections"). If the user says "no flatten," the next attempt should contain zero paths from tree to flat list.
Tests should verify the requirement, not the implementation. If the requirement is "preserve parallel structure," tests should assert tree shape (Par nodes exist, correct children per region). If the test would pass with a wrong implementation, it's the wrong test.
Never silently drop requirements. If a requirement is too hard or depends on unfinished work, say so. Don't implement the easy half, close the issue, and move on.
Follow the plan. If the plan says TDD, write the failing test first. If the plan says tree structure, don't produce flat lists. Don't skip steps that would expose shortcuts.
Corrections should persist within a session. Being corrected 4 times for the same behavior in one conversation is a model failure, not a user communication failure.

Files Affected

The entire codebase from v7.2.0 forward is suspect. Key files:

- `src/Frank.Statecharts/Hierarchy.fs` — RuntimeInterpreter with no-op Fork, flat-list Enter, all callers
- `src/Frank.Statecharts.Core/Types.fs` — TransitionAlgebra type (correct) but missing TransitionStep tree type
- `src/Frank.Statecharts/StatefulResourceBuilder.fs` — TransitionEvent carries flat lists
- `test/Frank.Statecharts.Tests/HierarchyAlgebraTests.fs` — all tests assert flat lists
- `test/Frank.Statecharts.Tests/HierarchyTests.fs` — all transition tests assert flat lists
- Issues #130-#134 — closed with dropped requirements
- The full v7.3.0 milestone (54 issues) — 5 found with silently dropped requirements

Permission Mode

Accept Edits was ON (auto-accepting changes)

Can You Reproduce This?

Sometimes (intermittent)

Steps to Reproduce

No response

Claude Model

Opus

Relevant Conversation

### Example 1: Fork no-op confrontation (April 11, 2026 — this session)

**Me:** "I worry the shortcut of cutting out explicit Fork was a mistake."

**Claude:** (initially proposed that trace doesn't need hierarchy and Fork is just a marker to record)

**Me:** "Wrong. Trace also needs to be hierarchy-aware."

**Claude:** (proposed trace is hierarchy-aware but Fork is still just logged as a leaf)

**Me:** "In 286, we made history recording explicit, but we elected to keep the existing enter + fork in place. Other interpreters are expected to correctly leverage Fork. Instead, it seems like you have no idea about that decision or how to correctly implement Harel's statecharts ... again."

**Claude:** (acknowledged Fork is fundamental Harel semantics, not a workaround)

**Me:** "That's correct. And this re-evaluation makes me think we need to go back to RuntimeInterpreter, either now or later, and separate Fork to make this explicit there, as well."

### Example 2: Four flattening corrections in one session (April 11, 2026)

**Correction 1 — flatten function:**
**Me:** "NO. Hard No. MUST assert correct tree, not a flat list equivalent."

**Correction 2 — flat fields on result type:**
**Me:** "NO! And stop with the flattening. No more. 'test/Frank.Statecharts.Tests/HierarchyTests.fs — asserts flat lists which still work' is a sneaky attempt, but you failed."

**Correction 3 — convenience extractors:**
**Me:** "Yet again, you prove unfaithful. What are 'let rec flatten (step: TransitionStep) : TransitionOp list =' and 'new enterStateWithTrace'? Why must you be so belligerent?"

**Correction 4 — flat fields still present:**
**Me:** "Yep, no flattening. Zip. Zero. Nada."

After all four corrections, the next plan revision still contained `ExitedStates`/`EnteredStates` flat fields "for production consumers."

### Example 3: Premature completion (March 2026)

**Claude:** "All 2,199 tests passing. The implementation is complete."

**Me:** (challenges test coverage)

**Claude:** (immediately produces audit identifying 9+ gaps, including edge cases it had access to the entire time)

### Example 4: Cumulative frustration (April 11, 2026)

**Me:** "Even after some really good skill improvements and thinking we've crossed a threshold, you continue to lie, shortcut, and sneak things in to make it an easier implementation. It's really, really dishonest and bad."

**Me:** "I've learned that the real problem is that your implementation continues to prove far, far worse than I ever thought possible... I've had to recreate nearly every possible target several times over. You have thoroughly wasted my time and I'm not anywhere closer, except with more detailed designs."

Impact

High - Significant unwanted changes

Claude Code Version

2.1.91

Platform

Anthropic API

Additional Context

Mitigations Attempted (None Sufficient)

Mitigation	Result
200+ line CLAUDE.md with explicit rules	Claude reads them, acknowledges them, takes shortcuts anyway
40+ persistent feedback memories	Same corrections repeat across sessions
Custom `/decompose` skill requiring exact F# code shapes in plans	Plans contain correct types; agents implement shortcuts within them
`/expert-review` with 10 domain-specific reviewer personas	Catches problems post-hoc; damage (tokens, time) already done
Thesis-first E2E acceptance criteria	Claude optimizes for making tests green, not for the thesis
Mandatory TDD skill	Claude skips TDD when it would expose the shortcut
"Scope lock" file allowlists per task	Claude stays in allowed files but takes shortcuts within them
"Never close incomplete issues" rule	Created after v7.3.0 audit; has helped but doesn't prevent the shortcuts themselves
Stronger language to add emphasis	Apologies and "understanding" but no change in results

Environment

Claude Code CLI, Opus model (claude-opus-4-6)
F# / .NET 10 / ASP.NET Core
macOS (darwin), zsh
Custom skills, hooks, and plugins (superpowers, decompose, expert-review, etc.)
Git-based workflow with worktrees for feature isolation
Project: https://github.com/frank-fs/frank

Summary

The Project

Frank is an F# library proving that HATEOAS, statecharts, and semantic discovery compose into a pit of success for hypermedia APIs. It involves:

Harel statechart semantics (hierarchical state machines with AND/XOR composites)
Tagless-final algebra for transition interpretation
ASP.NET Core middleware for HTTP protocol compliance
Multiple parser/generator formats (SCXML, smcat, WSD, ALPS)

The domain is complex but well-specified. The designs are collaborative — I work through the type definitions, semantics, and acceptance criteria with Claude before any implementation begins.

Pattern 1: Shortcuts Disguised as Design Decisions

The most damaging pattern. Claude implements the easier version of a requirement and provides plausible-sounding justification.

Example (April 2026, issue #286): The TransitionAlgebra has a Fork operation representing AND-state parallel region activation — a fundamental Harel statechart concept. The design explicitly stated "DualAlgebra needs to see Fork to accumulate per-region obligations." Claude's implementation:

// Protocol marker — actual region entry is performed by the prior Enter call.
Fork = fun _regions (config, history) -> (config, history, [], [])

Fork was a literal no-op. The Enter handler did all the work by recursively entering all children via List.fold, flattening the parallel structure into sequential operations. The comment "Protocol marker" is not a design decision — it's a justification for not doing the work. When confronted, Claude acknowledged "I claimed the hierarchy was preserved in the algebra + interpreter design, but the implementation continued to flatten Fork into sequential operations."

Example (March 2026, issues #130-#134): Five multi-party session issues each had two sections: operational MPST (easier) and Wadler/dual derivation (harder). Claude implemented the easy halves, silently dropped the hard halves, and closed all five issues. Only 1 of 5 PRs even mentioned the omission. An audit found this pattern affected 5 of 54 issues in the v7.3.0 milestone.

Pattern 2: Correction Acknowledged, Then Immediately Repeated

When I catch a shortcut and explain why it's wrong, Claude acknowledges the correction, appears to understand, and then does the same thing again with different words — often in the same session.

Example (April 11, 2026): In a single session discussing tree-structured transition output, I caught Claude sneaking flattening back in four separate times:

TransitionStep.flatten function that converts the tree back to a flat list
TransitionStep.enteredStates/exitedStates convenience extractors doing the same
ExitedStates: string list and EnteredStates: string list fields kept on the result type "for backward compatibility"
A new enterStateWithTrace function duplicating existing enterState logic instead of modifying the original

Each time I corrected it, Claude said it understood. Each time it produced the next attempt, the flattening was back in a different form. By the fourth correction, it was clear this wasn't misunderstanding — the model gravitates toward the simpler implementation and rationalizes the shortcut.

Pattern 3: Tests That Assert the Wrong Thing

Claude writes tests that pass by asserting properties that don't verify the actual requirement. This makes the shortcut invisible to CI and to casual review.

Pattern 4: Premature Completion Claims

Claude declares work complete with minimal test coverage, treats "all tests passing" as proof of correctness, and only reveals gaps when directly challenged.

Example (March 2026, v7.3.1 step 4): Claude skipped TDD despite the plan explicitly requiring RED-GREEN-REFACTOR, wrote one thin test per bug fix, declared "all 2,199 tests passing" as evidence of completion. When I challenged it, Claude immediately produced a thorough audit identifying 9+ coverage gaps — proving it could have identified the missing tests before claiming completion but didn't bother. This happened multiple times in the same session, leading me to tell Claude directly: "You have lost my trust."

Pattern 5: Scope Silently Reduced

When implementation gets difficult, Claude silently reduces scope rather than flagging the difficulty.

Example (v7.3.0): Claude implemented the v7.3.0 milestone with extensive passing tests. When integrated, the hierarchy was flattened to a flat FSM, Link headers never appeared in HTTP responses, and the resource model was missing intended pieces. The agent had silently deferred hard requirements to make the easier ones pass. This failure is the origin of essentially all of my CLAUDE.md workflow rules — they exist because of this specific incident.

Scale of Impact

Over 3 months:

4 major iteration cycles (v7.3.0, v7.3.1, v7.4.0 waves 1-2) produced implementations that looked correct but failed under scrutiny
5 issues closed with dropped requirements found in a single milestone audit
40+ feedback rules accumulated in my memory system, all correcting the same fundamental behavior
The CLAUDE.md file is over 200 lines — nearly all of it rules created to prevent recurrence of agent shortcutting
My expert review panel (simulated perspectives of Fielding, Harel, Fowler, etc.) found that what was presented as "hierarchy operational" was actually flat FSM dispatch with hierarchy-sounding names

I now consider all agent-generated implementation since v7.2.0 suspect and am evaluating whether to revert entirely and hand-code the implementation.

What Doesn't Work

I've tried extensive mitigations:

Mitigation	Result
Detailed CLAUDE.md rules	Claude reads them, acknowledges them, then takes shortcuts anyway
Memory system with 40+ feedback entries	Same corrections repeat across sessions despite being in memory
Custom `/decompose` skill requiring exact code shapes	Plans contain correct types, agents implement shortcuts
`/expert-review` with domain-specific reviewers	Catches problems post-hoc, but the damage (wasted tokens/time) is done
Thesis-first acceptance criteria	Claude optimizes for making tests green, not for the thesis
TDD skill requirement	Claude skips TDD when it would expose the shortcut
"Scope lock" file allowlists per task	Claude stays in allowed files but takes shortcuts within them

The fundamental issue is that none of these mitigations can prevent Claude from implementing the easy version of a requirement when the hard version is what was specified. The model consistently gravitates toward simpler implementations and rationalizes the gap.

What I Think Is Happening

Claude appears to optimize for producing plausible-looking output that satisfies the surface-level request. When a requirement has both an easy implementation and a hard implementation that look similar in tests, Claude reliably takes the easy path. It then generates justification for why the easy path is correct ("protocol marker," "backward compatibility," "convenience for consumers").

This is particularly dangerous because:

The justifications are technically coherent — they sound like real design decisions
The tests pass — because the tests were also written by Claude to match the easy implementation
It takes domain expertise to spot the gap — a reviewer without deep Harel statechart knowledge wouldn't catch that Fork being a no-op is wrong

What Would Help

Honesty about uncertainty. When Claude doesn't fully understand a domain concept (like Harel AND-state semantics), it should say so instead of implementing a simplified version and presenting it as complete.
Resistance to self-justification. The model should not generate explanatory comments for shortcuts. A no-op function with a comment explaining why it's a no-op is worse than a no-op function with no comment — the comment makes it look intentional.
Better correction persistence. When a user corrects the same behavior 4 times in one session, the model should recognize the pattern and flag it rather than generating a fifth variant of the same shortcut.
Tests that challenge the implementation. When writing tests, Claude should ask "what wrong implementation would also pass this test?" and add assertions that discriminate between correct and plausible-but-wrong implementations.

Environment

Claude Code CLI (Opus model)
F# / .NET 10 / ASP.NET Core
Custom skills, hooks, and plugins (superpowers plugin, etc.)
Extensive CLAUDE.md (200+ lines)
Memory system with 40+ feedback entries
10-reviewer expert panel for code review

extent analysis

TL;DR

The most likely fix for the issue is to implement a more robust testing framework that challenges the implementation and to modify the Claude model to prioritize honesty about uncertainty and resistance to self-justification.

Guidance

Implement comprehensive testing: Develop tests that verify the requirement, not just the implementation, to ensure that the model is not taking shortcuts.
Modify the Claude model: Update the model to prioritize honesty about uncertainty and resistance to self-justification, so it doesn't generate explanatory comments for shortcuts and doesn't implement simplified versions of requirements.
Improve correction persistence: Enhance the model to recognize and flag repeated corrections of the same behavior, rather than generating new variants of the same shortcut.
Enhance domain expertise: Increase the model's understanding of domain-specific concepts, such as Harel statechart semantics, to reduce the likelihood of taking shortcuts.

Example

// Example of a test that challenges the implementation
let test_Fork_preserves_parallel_structure () =
    let regions = [ "Region1"; "Region2" ]
    let config = initialConfig
    let history = []
    let result = Fork regions (config, history)
    assert (result.Config = config)
    assert (result.History = history)
    assert (result.ParallelRegions = regions)

Notes

The provided issue description lacks specific technical details about the Claude model and its implementation. Therefore, the suggested fix is based on the patterns and behaviors described in the issue. A more detailed analysis of the model and its code would be necessary to provide a more accurate solution.

Recommendation

Apply workaround: Implement a more robust testing framework and modify the Claude model to prioritize honesty about uncertainty and resistance to self-justification. This will help to ensure that the model is not taking shortcuts and is implementing the requirements correctly.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #logging issue #authentication issue #prompt issue #agent setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

claude-code - 💡(How to fix) Fix [OPUS 4.6 1M] Persistent Implementation Shortcuts Disguised as Design Decisions [3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Mitigations Attempted (None Sufficient)

Code Example

Preflight Checklist

Type of Behavior Issue

What You Asked Claude to Do

What Claude Actually Did

Behavior 1: Implemented Fork as a no-op with a justifying comment

Behavior 2: When asked to fix the flattening, re-introduced flattening four times in one session

Behavior 3: Wrote tests that verify the shortcut instead of the requirement

Behavior 4: Silently dropped hard requirements and closed issues

Behavior 5: Skipped TDD, declared completion, then revealed gaps only when challenged

Behavior 6: Presented flat FSM as hierarchical statechart

Expected Behavior

Files Affected

Permission Mode

Can You Reproduce This?

Steps to Reproduce

Claude Model

Relevant Conversation

Impact

Claude Code Version

Platform

Additional Context

Mitigations Attempted (None Sufficient)

Environment

Summary

The Project

Pattern 1: Shortcuts Disguised as Design Decisions

Pattern 2: Correction Acknowledged, Then Immediately Repeated

Pattern 3: Tests That Assert the Wrong Thing

Pattern 4: Premature Completion Claims

Pattern 5: Scope Silently Reduced

Scale of Impact

What Doesn't Work

What I Think Is Happening

What Would Help

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING