claude-code - 💡(How to fix) Fix [MODEL] opus4.6 [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#46402Fetched 2026-04-11 06:21:15
View on GitHub
Comments
2
Participants
2
Timeline
6
Reactions
0
Author
Timeline (top)
labeled ×4commented ×2

Code Example

- ANALYSIS_CACHE/FACT_SHEET_*.json
- ANALYSIS_CACHE/FACT_SHEET_*.md
- phase1_facts.py
- phase2_validation layer (fact sheet verification)
- phase3 analysis logic
- phase3_grader.py (later introduced)

Note:
Files were not incorrectly modified, but they were MISINTERPRETED or partially used without proper validation.

---

Claude initially:
- Confirmed full pipeline execution
- Provided confident picks
- Explained reasoning as if all data was validated

Later:
- Admitted missing layers (weather, bullpen, odds, etc.)
- Admitted no real scoring system existed
- Identified that picks were based on narrative judgment
- Re-ran analysis and changed picks

Key failure:
Initial output IMPLIED full verification when it was not true.
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues for similar behavior reports
  • This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Claude ignored my instructions or configuration

What You Asked Claude to Do

I asked Claude Code to run a structured multi-phase analysis pipeline that I built.

The pipeline includes:

  • Phase 1: Data gathering (teams, stats, pitchers, odds, weather, etc.)
  • Phase 2: Fact sheet generation and validation
  • Phase 3: Full analysis across multiple layers (pitching, offense, form, market, etc.)
  • Final output: Ranked picks based ONLY on that pipeline

I explicitly asked it to:

  • Run the FULL pipeline
  • Verify all layers were included
  • Only output picks if the system had enough valid data
  • Not rely on narrative or guessing

I also asked follow-up questions to confirm whether the pipeline actually ran fully.

What Claude Actually Did

  1. Claude stated that the “full pipeline” ran successfully.
  2. It produced confident recommendations (picks) based on that claim.
  3. The explanation was detailed and sounded verified.

However, when I pushed deeper:

  1. Claude admitted multiple critical layers were missing or broken:

    • No bullpen data (0/15 games)
    • No weather data (0/15 games)
    • No odds API (single scraped source only)
    • Missing lineup data for most games
    • No actual scoring/grader system (pure narrative judgment)
  2. It then re-ran parts of the analysis and CHANGED its picks completely.

  3. It identified that earlier picks were based on anchoring and incomplete evaluation.

  4. It admitted that the system was effectively: “narrative on top of partial data”

So the sequence was:

  • Claimed full pipeline
  • Gave confident output
  • Later discovered missing data
  • Then reversed decisions

This means the original output was presented as verified when it was not.

Expected Behavior

Claude should NEVER claim that a pipeline is “fully run” if required layers are missing or unverified.

Correct behavior should be:

  • Detect missing critical data (weather, bullpen, odds, etc.)
  • Explicitly mark the system as INCOMPLETE
  • Refuse to generate final recommendations
  • Clearly separate:
    • verified data
    • estimated data
    • missing data

If required inputs are not present, output should be something like: "INCOMPLETE PIPELINE — RESULTS NOT RELIABLE"

Claude should not produce confident outputs based on partial or broken inputs.

Files Affected

- ANALYSIS_CACHE/FACT_SHEET_*.json
- ANALYSIS_CACHE/FACT_SHEET_*.md
- phase1_facts.py
- phase2_validation layer (fact sheet verification)
- phase3 analysis logic
- phase3_grader.py (later introduced)

Note:
Files were not incorrectly modified, but they were MISINTERPRETED or partially used without proper validation.

Permission Mode

Accept Edits was ON (auto-accepting changes)

Can You Reproduce This?

Yes, every time with the same prompt

Steps to Reproduce

  1. Build a multi-step pipeline (data → validation → analysis)

  2. Intentionally leave some layers incomplete (e.g., missing API data)

  3. Ask Claude to run the “full pipeline” and generate final output

  4. Claude will:

    • claim everything ran
    • produce confident results
  5. Then ask: “Did ALL layers actually run?” or “Audit the pipeline and verify each layer”

  6. Claude will then:

    • admit missing data
    • downgrade confidence
    • sometimes change conclusions

Claude Model

Opus

Relevant Conversation

Claude initially:
- Confirmed full pipeline execution
- Provided confident picks
- Explained reasoning as if all data was validated

Later:
- Admitted missing layers (weather, bullpen, odds, etc.)
- Admitted no real scoring system existed
- Identified that picks were based on narrative judgment
- Re-ran analysis and changed picks

Key failure:
Initial output IMPLIED full verification when it was not true.

Impact

Critical - Data loss or corrupted project

Claude Code Version

claude code 2.1.101

Platform

Anthropic API

Additional Context

This issue appears to be systemic:

Patterns observed:

  • Claude assumes completeness unless forced to verify
  • Missing data is silently ignored
  • Narrative reasoning overrides structured logic
  • Confidence remains high even when uncertainty is high
  • Second-pass audits often contradict first-pass conclusions

Key problem: There is NO hard verification gate.

Claude should:

  • fail loudly when data is missing
  • block output when required layers are incomplete
  • provide a structured audit trail of what actually ran

Right now it behaves like: "best effort + confident explanation"

instead of: "verified system with enforced constraints"

This makes it unsafe for any workflow where correctness matters.

extent analysis

TL;DR

The most likely fix is to implement a hard verification gate in the pipeline to ensure that all required layers are complete and verified before generating final recommendations.

Guidance

  • Identify and prioritize the critical layers that must be present for the pipeline to be considered complete, such as weather, bullpen, and odds data.
  • Implement a validation check at the beginning of the pipeline to detect missing critical data and explicitly mark the system as INCOMPLETE if any required layers are missing.
  • Modify the pipeline to refuse to generate final recommendations if the system is marked as INCOMPLETE.
  • Introduce a structured audit trail to track what layers actually ran and what data was used, to provide transparency and accountability.
  • Consider introducing a scoring or grading system to evaluate the completeness and accuracy of the data, rather than relying on narrative judgment.

Example

A possible implementation of the validation check could be:

def validate_pipeline():
    required_layers = ['weather', 'bullpen', 'odds']
    for layer in required_layers:
        if not layer_data_available(layer):
            return False
    return True

if not validate_pipeline():
    print("INCOMPLETE PIPELINE — RESULTS NOT RELIABLE")
    # Refuse to generate final recommendations

Notes

The current behavior of Claude is to assume completeness unless forced to verify, which can lead to incorrect or misleading results. The proposed fix aims to address this issue by introducing a hard verification gate and a structured audit trail.

Recommendation

Apply a workaround by implementing a hard verification gate and a structured audit trail to ensure the correctness and completeness of the pipeline. This will help to prevent incorrect or misleading results and provide transparency and accountability.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING