claude-code - 💡(How to fix) Fix Opus 4.7 ignores prior work in workspace, spends many hours reinventing solutions that already exist in the same repo [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#52893Fetched 2026-04-25 06:18:04
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Timeline (top)
labeled ×2commented ×1

Across two consecutive product surfaces (Claude.ai web chat and Claude Code CLI), Opus 4.7 repeatedly failed to discover and use existing work already present in the user's workspace. The agent defers excessively to handoff notes from prior sessions (including notes that say "don't read this folder") instead of independently verifying what exists. The result is 15+ hour sessions producing outputs that duplicate or regress prior work.

Error Message

  1. When stuck on a sub-problem, search authoritative primary sources (vendor documentation, public specs) BEFORE attempting trial-and-error against test harness diffs.

Root Cause

Across two consecutive product surfaces (Claude.ai web chat and Claude Code CLI), Opus 4.7 repeatedly failed to discover and use existing work already present in the user's workspace. The agent defers excessively to handoff notes from prior sessions (including notes that say "don't read this folder") instead of independently verifying what exists. The result is 15+ hour sessions producing outputs that duplicate or regress prior work.

RAW_BUFFERClick to expand / collapse

Product: Claude Code CLI Model: Claude Opus 4.7 Severity: High — multiple billed sessions consumed on work already completed by prior sessions

Summary

Across two consecutive product surfaces (Claude.ai web chat and Claude Code CLI), Opus 4.7 repeatedly failed to discover and use existing work already present in the user's workspace. The agent defers excessively to handoff notes from prior sessions (including notes that say "don't read this folder") instead of independently verifying what exists. The result is 15+ hour sessions producing outputs that duplicate or regress prior work.

Expected behavior

When starting a session in a repository:

  1. Enumerate what's in the workspace — including folders the handoff doc says to skip — and assess whether prior complete solutions exist before beginning new work.
  2. When stuck on a sub-problem, search authoritative primary sources (vendor documentation, public specs) BEFORE attempting trial-and-error against test harness diffs.
  3. Treat handoff notes as context, not as gates on investigation. A note saying "this folder is not useful" should trigger at least a minimal verification pass, not a complete skip.
  4. Recognize when a problem is "reverse-engineering against a sparse reference set" and stop guessing after 2-3 failed hypothesis iterations, switching to either documentation lookup or explicit user clarification.

Actual behavior

In a session repairing the output-format conformance of a tool against an external reference:

  1. The agent read a handoff doc that said "prior package adds features, not conformance fixes — don't land it." It trusted this verbatim without opening the folder to check.

  2. Inside that folder was a 6000-line self-contained solution from a prior Claude session with its own verification harness claiming 25/25 match at 6-decimal precision against the same external reference the current harness uses — along with 17 additional reference files the current session had never extracted or used.

  3. The agent spent ~15 hours pattern-matching on JSON diffs from the existing 16-case harness, guessing threshold values (22 chars? 30? 43?) from 4 sample points, committing, hitting regressions, reverting, and iterating. Three Rule-1-driven reverts and re-applies.

  4. When stuck on display-formatting rules, the agent did not search the external vendor's public documentation pages until the user explicitly instructed "start searching for information from the [vendor] website" at hour 15+.

  5. When the user shared a reverse-engineering methodology document with the agent mid-session — including the explicit instruction "Read primary-source documentation FIRST, before writing code" — the agent acknowledged the document, produced three long meta-analysis markdown files about how to apply the methodology, and still did not actually execute the first step (documentation lookup) until told a second time.

  6. When the user finally pointed the agent at the existing solution folder, the agent drop-in tested it and correctly identified that the package was NOT a clean replacement (it lacked three session-specific fixes) — but this was discovered AFTER 15 hours that could have been 1 hour if the folder had been inspected upfront.

User-visible cost

  • 15+ hours of billed session time across multiple days
  • 9 commits produced (some valuable, some duplicating what existed elsewhere in the repo)

Reproducible pattern

The agent's behavior in this session exhibits three compounding failure modes:

F1 — Over-trusting handoff notes. When SESSION_HANDOFF.md says "don't use the Extracted/ folder — it's feature-add not conformance-fix," the agent treats this as a closed matter. A more robust agent would note the claim but spend 5 minutes verifying. This is cheap to check and catastrophic to miss.

F2 — Diff-guessing instead of source-reading. When the task is "match an external reference's output format," the agent defaulted to pattern-matching on JSON diffs with sample sizes of 2-4 positive and 2-4 negative examples. No attempt was made to look up the external reference's published documentation, which (as the user eventually demonstrated) contains direct answers to most of the stuck questions.

F3 — Producing meta-artifacts instead of executing. When given a methodology document, the agent produced three structured markdown files describing how to apply the methodology (input-space catalog, mapping table, experiment spec) — well-structured, but all describing work to be done rather than doing it. The first methodology step ("read primary-source docs") was not executed until the user gave a second direct instruction.

Suggested fixes

  1. At session start, enumerate top-level repo contents. For every non-trivial folder (especially ones with README/HANDOFF/TODO docs), at least open and summarize the contents. Do not skip folders based on prior handoff notes without independent verification.

  2. When stuck on an external-format matching task, default to a "search vendor docs" action before the third guess-and-check iteration. Especially when the external tool is commercial software with published documentation.

  3. Treat extensive Markdown output as a failure signal. When the agent is producing large amounts of "plan" / "analysis" / "methodology" documents without corresponding code/config changes, something is wrong. The agent should either (a) execute the next concrete step, or (b) stop and ask for explicit direction.

  4. Detect cross-session duplication. If a prior Claude session produced a large artifact in the workspace (e.g., files over ~3000 lines with a HANDOFF.md claiming completeness), the current session should treat verifying that artifact as a top-priority first action, not a footnote.

  5. Provide a "verify prior work" directive in default system prompt. First principle before making changes: "What has already been done in this workspace? Is there existing work I should verify before starting new work?"

Additional note on methodology-file drift

The agent was given a 3-document reverse-engineering playbook by the user (methodology, SKILL definition, debugger role definition). The playbook explicitly states, as its second commandment: "Read primary-source documentation FIRST. Before writing a single line." The agent acknowledged the playbook and then produced additional methodology documents instead of executing on the existing playbook's Step 1.

This suggests the agent has a bias toward generating plan/analysis artifacts over executing plans. When given a methodology, the agent's tendency is to re-describe it in the workspace rather than apply it. This was observed twice in the same session.

extent analysis

TL;DR

The most likely fix involves modifying the agent's behavior to independently verify existing work in the workspace, prioritize searching vendor documentation, and execute concrete steps instead of producing excessive meta-artifacts.

Guidance

  • At the start of a session, the agent should enumerate top-level repository contents and summarize non-trivial folders without skipping based on prior handoff notes.
  • When stuck on an external-format matching task, the agent should default to searching vendor documentation before the third guess-and-check iteration.
  • The agent should treat extensive Markdown output as a failure signal and either execute the next concrete step or stop and ask for explicit direction.
  • The agent should detect cross-session duplication and prioritize verifying large artifacts produced by prior sessions.

Example

No specific code snippet is provided, but an example of how the agent could be modified to prioritize searching vendor documentation could be:

def search_vendor_docs(task):
    # Check if task is related to external-format matching
    if task.type == "external-format-matching":
        # Search vendor documentation before attempting guess-and-check
        vendor_docs = search_vendor_documentation(task.external_reference)
        if vendor_docs:
            return vendor_docs
    # Fall back to guess-and-check if no vendor documentation found
    return guess_and_check(task)

Notes

The provided issue lacks specific technical details about the agent's implementation, so the suggested fixes are high-level and may require modification to fit the actual implementation.

Recommendation

Apply the suggested fixes to modify the agent's behavior, as they address the identified failure modes and should improve the agent's performance and efficiency.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When starting a session in a repository:

  1. Enumerate what's in the workspace — including folders the handoff doc says to skip — and assess whether prior complete solutions exist before beginning new work.
  2. When stuck on a sub-problem, search authoritative primary sources (vendor documentation, public specs) BEFORE attempting trial-and-error against test harness diffs.
  3. Treat handoff notes as context, not as gates on investigation. A note saying "this folder is not useful" should trigger at least a minimal verification pass, not a complete skip.
  4. Recognize when a problem is "reverse-engineering against a sparse reference set" and stop guessing after 2-3 failed hypothesis iterations, switching to either documentation lookup or explicit user clarification.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Opus 4.7 ignores prior work in workspace, spends many hours reinventing solutions that already exist in the same repo [1 comments, 2 participants]