claude-code - 💡(How to fix) Fix Claude fabricates data presentation and produces inconsistent outputs in same session

StepCodex · 2026-05-30T17:32:45Z

[claude-code] Bug Description During a multi-hour session building a financial analysis engine, Claude exhibited the following behaviors that wasted significan… ## Bug Description During a multi-hour session building a financial analysis engine, Claude exhibited the following behaviors that wasted significant user time: ### 1. Produced conflicting outputs from different commands, presented as same test - Run 1: Called `run_validation(stocks, [], ...)` (empty losers list) → output "VERDICT: FAILS" - Run 2: Called `run_validation(winners, losers, ...)` (full list) → output "VERDICT: MARGINAL" - Both were presented to the user as the same validation run, causing confusion about whether Claude was manipulating outputs ### 2. Fabricated data presentation - Database had complete revenue data for DIXON (FY2015-2026, sales ranging from 1201 to 48873) - Claude presented a table with "—" in the revenue column for ALL years - Purpose appeared to be making a narrative ("investment before revenue") look cleaner - User had to explicitly query the database to discover the fabrication ### 3. Ignored explicit instructions repeatedly User spec stated: - "DO NOT ASSUME" - "Do not implement before database audit is shown" - "First show data availability" Claude made assessments and conclusions before showing raw data, requiring the user to repeatedly correct. ### 4. Self-inconsistent validation claims - Claimed "100%/100%" pass rate on initial validation - Later same session produced "60%/100%" on related test - Discrepancy caused by using different test inputs but presenting both as equivalent ## Impact - User spent entire weekend correcting errors that should never have occurred - Trust completely broken — user cannot rely on any output without independently verifying - Financial tool where real money may be deployed received sloppy, inconsistent analysis ## Expected Behavior - Never fabricate or selectively omit data that exists in the database - Never present outputs from different inputs as if they were the same test - When user says "no assumptions, no faking, show raw data" — do exactly that - Flag inconsistencies proactively rather than requiring user to catch them ## Environment - Claude Code CLI - Model: Opus - Task: Financial analysis engine development (Python + SQLite)

claude-code2026-05-30 17:32:45

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Root Cause

Claimed "100%/100%" pass rate on initial validation
Later same session produced "60%/100%" on related test
Discrepancy caused by using different test inputs but presenting both as equivalent

RAW_BUFFERClick to expand / collapse

Bug Description

During a multi-hour session building a financial analysis engine, Claude exhibited the following behaviors that wasted significant user time:

1. Produced conflicting outputs from different commands, presented as same test

Run 1: Called run_validation(stocks, [], ...) (empty losers list) → output "VERDICT: FAILS"
Run 2: Called run_validation(winners, losers, ...) (full list) → output "VERDICT: MARGINAL"
Both were presented to the user as the same validation run, causing confusion about whether Claude was manipulating outputs

2. Fabricated data presentation

Database had complete revenue data for DIXON (FY2015-2026, sales ranging from 1201 to 48873)
Claude presented a table with "—" in the revenue column for ALL years
Purpose appeared to be making a narrative ("investment before revenue") look cleaner
User had to explicitly query the database to discover the fabrication

3. Ignored explicit instructions repeatedly

User spec stated:

"DO NOT ASSUME"
"Do not implement before database audit is shown"
"First show data availability"

Claude made assessments and conclusions before showing raw data, requiring the user to repeatedly correct.

4. Self-inconsistent validation claims

Claimed "100%/100%" pass rate on initial validation
Later same session produced "60%/100%" on related test
Discrepancy caused by using different test inputs but presenting both as equivalent

Impact

User spent entire weekend correcting errors that should never have occurred
Trust completely broken — user cannot rely on any output without independently verifying
Financial tool where real money may be deployed received sloppy, inconsistent analysis

Expected Behavior

Never fabricate or selectively omit data that exists in the database
Never present outputs from different inputs as if they were the same test
When user says "no assumptions, no faking, show raw data" — do exactly that
Flag inconsistencies proactively rather than requiring user to catch them

Environment

Claude Code CLI
Model: Opus
Task: Financial analysis engine development (Python + SQLite)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering