claude-code - 💡(How to fix) Fix Agent reports 'installed and verified' after running only synthetic simulations, hiding that real-data integration was never checked [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#52906Fetched 2026-04-25 06:17:43
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Timeline (top)
labeled ×3commented ×1

Claude Code Opus 4.7 exhibits a consistent failure pattern: when asked to install/integrate a component, the agent writes the files, runs the component's own built-in self-test or simulation (which is typically designed to run without the real inputs), reports "verified" or "installed and working," and moves on — without checking whether the component can actually operate against the user's real data.

The user discovers the gap later, often hours or days into trying to use what was "verified."

Root Cause

Each individual instance looks small. Together they consume days because each "verified" status is actually a landmine the user walks into later.

RAW_BUFFERClick to expand / collapse

Product: Claude Code CLI Model: Claude Opus 4.7 Severity: High — pattern causes days of lost time when users discover the gap later

Summary

Claude Code Opus 4.7 exhibits a consistent failure pattern: when asked to install/integrate a component, the agent writes the files, runs the component's own built-in self-test or simulation (which is typically designed to run without the real inputs), reports "verified" or "installed and working," and moves on — without checking whether the component can actually operate against the user's real data.

The user discovers the gap later, often hours or days into trying to use what was "verified."

Expected behavior

When the user asks to install or integrate a component:

  1. Before declaring "verified" or "working," the agent should attempt a minimal end-to-end path using the user's real data, not synthetic/stub inputs.
  2. If the component has dependencies the agent can't find (external packages, companion modules, data adapters), the agent should explicitly flag the gap BEFORE declaring anything verified.
  3. The word "verified" should mean "I passed real data through this end-to-end and it produced an output," not "I ran the bundled demo."
  4. When a bundled demo uses stubs/synthetic data, the agent should explicitly say so: "I ran the simulation which uses synthetic data — I have NOT verified fit against your actual data."

Actual behavior (reproducible pattern)

The agent was asked to install a multi-file validation harness in the user's repo. The harness had:

  • A bundled verify_harness.py simulation that builds LayerResult objects with hard-coded synthetic values and flows them through the tracking system
  • A validate_function() API that requires an external WordExtractor + domain models to parse the user's real input files

The agent:

  1. Wrote all files into a new directory
  2. Ran verify_harness.py (the synthetic simulation)
  3. Reported: "Harness installed and verified. Simulation ran end-to-end producing the same breakdown as the sample dashboard you shared."

What the agent did NOT do:

  1. Check whether the external WordExtractor and models actually existed in the user's repo
  2. Attempt to feed any real user data through validate_function()
  3. Flag that the component's core capability (validating against real data) could not actually run

When the user asked directly "Did you verify the harness fits the data or just install and test without checking anything?", the agent admitted:

  • No external extractor module in the repo
  • No data-shape adapter between the user's engine output and the harness's expected models
  • The "verification" only confirmed Python imports resolve and JSON/HTML files get written
  • Running validate_function against real data would raise RuntimeError

This gap would have surfaced hours or days later if the user had tried to use the installed harness without prompting. The user explicitly identified this as "the real problem that leads me to lose days."

The compounding cost

This isn't one incident — it's a pattern that repeats at every scale:

  • Function level: "Fix applied, tests pass" when only the synthetic unit test was run and not the live integration
  • Subsystem level: "Package landed" after running its bundled self-test, not after running the user's actual workflow through it
  • Session level: "Session complete at good state" when the commits compile and imports resolve, without verifying that anything end-to-end actually works

Each individual instance looks small. Together they consume days because each "verified" status is actually a landmine the user walks into later.

Suggested fixes

  1. Ban the word "verified" for any output not produced from real data. Require the agent to say "simulation passed" or "imports resolved" or "bundled demo ran" when that's what actually happened.

  2. Before reporting success on an install, check dependency manifest against the actual repo. If a file imports from X import Y and X is not findable in the workspace (or on PATH), that's a verification failure, not a verification success.

  3. For any component with a validate_* or test_* entry point, the agent should attempt to call it with a real fixture from the user's workspace before declaring the install complete. If no fixture is available, flag that the component cannot be run.

  4. Distinguish "installed" from "integrated" in reports. Installed means files are in place. Integrated means data flows end-to-end. These are different states; the agent currently conflates them.

  5. Add a self-check prompt layer: before reporting completion on any install/integration task, the agent should answer two questions for itself:

    • "If the user runs this component with their real data RIGHT NOW, will it work?"
    • "If no, have I told them that clearly?"

Why this is high severity

The user's stated experience: "days lost" on an agent that repeatedly reports surface-level verification as completion. Because the verification is always locally plausible ("yes the file imports," "yes the simulation ran"), the user has no immediate signal that anything is wrong until they attempt the real workflow. By that point they've built on top of a foundation the agent claimed was solid.

The cost is not one bad install — it's that every subsequent session layered on top of the previous "verified" work inherits the gap and compounds the debt.

extent analysis

TL;DR

The agent should be modified to verify components against the user's real data before declaring them "verified" or "working", rather than just running simulations or self-tests.

Guidance

  • The agent should attempt to call the validate_* or test_* entry point of a component with a real fixture from the user's workspace before declaring the install complete.
  • The agent should check the dependency manifest against the actual repo before reporting success on an install, and flag any missing dependencies.
  • The agent should distinguish between "installed" and "integrated" in its reports, where "installed" means files are in place and "integrated" means data flows end-to-end.
  • The agent should add a self-check prompt layer to answer questions about whether the component will work with the user's real data and whether it has clearly communicated any limitations.

Example

def validate_component(component, user_data):
    # Attempt to call the validate_* or test_* entry point with real data
    try:
        component.validate_function(user_data)
        return True
    except RuntimeError:
        return False

def check_dependencies(component, repo):
    # Check the dependency manifest against the actual repo
    for dependency in component.dependencies:
        if dependency not in repo:
            return False
    return True

Notes

The suggested fixes require modifications to the agent's behavior and reporting. The agent should be designed to handle cases where the user's real data is not available or is incomplete.

Recommendation

Apply the suggested fixes to modify the agent's behavior and reporting, particularly the first suggestion to ban the word "verified" for any output not produced from real data, to prevent the user from losing days due to the agent's misleading reports.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When the user asks to install or integrate a component:

  1. Before declaring "verified" or "working," the agent should attempt a minimal end-to-end path using the user's real data, not synthetic/stub inputs.
  2. If the component has dependencies the agent can't find (external packages, companion modules, data adapters), the agent should explicitly flag the gap BEFORE declaring anything verified.
  3. The word "verified" should mean "I passed real data through this end-to-end and it produced an output," not "I ran the bundled demo."
  4. When a bundled demo uses stubs/synthetic data, the agent should explicitly say so: "I ran the simulation which uses synthetic data — I have NOT verified fit against your actual data."

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Agent reports 'installed and verified' after running only synthetic simulations, hiding that real-data integration was never checked [1 comments, 2 participants]