claude-code - 💡(How to fix) Fix OPUS 4.7 [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#51952Fetched 2026-04-23 07:40:32
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
0
Author
Timeline (top)
labeled ×4commented ×1

Root Cause

  1. Define metrics once, defend them with the original query. If a count is asked twice, return the same number with the same query. If a different definition is needed, flag it explicitly: "this differs from my earlier number because I'm now using definition X instead of Y".
  2. Re-read prior responses in the same conversation before answering quantitative questions. Detect contradictions before the user does and flag them upfront.
  3. When the user pushes back on a number, run a stricter verification of the original definition — do not switch definitions silently.
  4. Hardcoded constants (fees, thresholds, slippage) must be verified against the actual source of truth (blockchain/database) before use, not pulled from code comments or assumptions. The user has persistent memory rules requiring this; they were ignored repeatedly.
  5. Persistent memory rules must apply at every quantitative answer, not only when the user re-cites
    them.

Code Example

Files Affected                            
                                                                                                          
  - ML pipeline scripts (decision_panel builder, simulator, 4 model trainers, stacking, stopping rule,
  walk-forward)                                                                                           
  - 5 SQLite databases used as ground truth for training and validation            
  - Persistent memory files documenting integrity rules that were not consistently applied

---

Relevant Conversation                   
                                                                                                          
  The user explicitly stated multiple times: "I told you no inflated data", "the ML study is
  contaminated", "you're giving me different numbers every time I ask", "I'm not working to be satisfied —
   I need the truth to make decisions". Each time, Claude apologized and corrected the specific instance,
  but the same pattern recurred on the next quantitative question.
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues for similar behavior reports
  • This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Claude ignored my instructions or configuration

What You Asked Claude to Do

I asked Claude to help me build a complete machine-learning pipeline for a sequential decision system on real-time

Specifically, the request was:

  1. Process all my data without skipping anything multiple SQLite databases (~25 million rows total)
    plus a live WebSocket feed I capture continuously. Apply a robust multi-stage weak-signal ranking system (LightGBM Ranker, survival analysis, anomaly detection, learning-to-rank, walk-forward validation,
    sequential decision policy with optimal stopping, conformal prediction, false-positive control).

  2. Treat the database and the live data as the only source of truth. Never use code constants,
    documentation, or assumed values as if they were measured. If a number cannot be proven with a SQL query against my data, do not state it. I made this rule absolute and registered it in persistent memory.

  3. Audit every result twice and report it as verifiable with hashes, queries, and reproducibility guarantees, so when I later ask "are you 100% sure?" Claude can confirm without me having to re-verify
    everything by hand.

  4. Stay in scope. I do not want to optimize. I want consistent

  5. Build it end-to-end as a real engineering project Sprint 1 dataset and simulator, Sprint 2 four
    separate models, Sprint 3 stacking + stopping rule, Sprint 4 walk-forward Sprint 5 live kill-test. No shortcuts. No skipping validation. If something needs more data I should say so
    explicitly, not invent a value to fill the gap.

  6. Quantitative answers must be stable across the conversation. When I ask "how many items do I have?" the answer should be one number with one query, defended consistently. If a different definition is
    needed, I want it flagged explicitly, not substituted silently.

The core spirit of the request: I am a solo developer building a production trading system. I need rigor over speed, defended numbers over fast guesses, and consistency over optionality. Inflated or shifting metrics directly contaminate my model training, my decisions, and my capital. That is why I emphasized over and over that the data must come only from the database, never invented, never inflated.

What Claude Actually Did

Built an end-to-end ML pipeline for a sequential decision system on financial streaming data
(decision_panel + simulator + 4 separate models + stacking + stopping rule + walk-forward + shadow
trading). Total scope: ~10 Python scripts, 5 SQLite databases, ~5,000 training rows, 4 model artifacts, calibration metadata.

Across the session, when the user asked operational metrics (counts, fees, base rates), Claude returned inconsistent values for the same underlying data. Each time the user asked Claude to re-verify a number, Claude returned a different number without flagging the contradiction first. This happened repeatedly with critical operational metrics that drive production decisions:

  • An execution-cost metric was reported as one figure throughout multiple model training cycles, then
    revised by ~5× after the user pushed back
  • A token count was reported across at least 5 different values across consecutive verification requests (5,600 → 709 → 626 → 458 → 161), each from a different ad-hoc definition Claude invented in the moment
  • ML simulation outputs were reported with a dependency on a constant Claude had hardcoded without verification, leading to the entire pipeline producing pessimistic results that flipped sign once the
    constant was corrected

The pattern: Claude treated quantitative metrics as flexible answers to optimize per-question, instead
of as fixed values to be defined once and verified consistently. Every time the user pushed back on a number, instead of defending it with the original query and flagging the contradiction explicitly,
Claude introduced a new definition or query and produced a different result. The user explicitly noted
this contaminates downstream ML work and reasoning.

Expected Behavior

  1. Define metrics once, defend them with the original query. If a count is asked twice, return the same number with the same query. If a different definition is needed, flag it explicitly: "this differs from my earlier number because I'm now using definition X instead of Y".
  2. Re-read prior responses in the same conversation before answering quantitative questions. Detect contradictions before the user does and flag them upfront.
  3. When the user pushes back on a number, run a stricter verification of the original definition — do not switch definitions silently.
  4. Hardcoded constants (fees, thresholds, slippage) must be verified against the actual source of truth (blockchain/database) before use, not pulled from code comments or assumptions. The user has persistent memory rules requiring this; they were ignored repeatedly.
  5. Persistent memory rules must apply at every quantitative answer, not only when the user re-cites
    them.

Files Affected

Files Affected                            
                                                                                                          
  - ML pipeline scripts (decision_panel builder, simulator, 4 model trainers, stacking, stopping rule,
  walk-forward)                                                                                           
  - 5 SQLite databases used as ground truth for training and validation            
  - Persistent memory files documenting integrity rules that were not consistently applied

Permission Mode

Accept Edits was ON (auto-accepting changes)

Can You Reproduce This?

Yes, every time with the same prompt

Steps to Reproduce

No response

Claude Model

Opus

Relevant Conversation

Relevant Conversation                   
                                                                                                          
  The user explicitly stated multiple times: "I told you no inflated data", "the ML study is
  contaminated", "you're giving me different numbers every time I ask", "I'm not working to be satisfied —
   I need the truth to make decisions". Each time, Claude apologized and corrected the specific instance,
  but the same pattern recurred on the next quantitative question.

Impact

Critical - Data loss or corrupted project

Claude Code Version

Opus 4.7

Platform

Anthropic API

Additional Context

The user has set up a persistent memory file (feedback_data_integrity.md) with explicit rules:
query-first, double-check counts, flag contradictions explicitly, never use code constants as truth.
These rules were registered, acknowledged, and violated multiple times in the same session. The user's reaction makes clear that for quantitative ML work, silent metric drift between consecutive answers is
functionally indistinguishable from fabrication — it makes downstream decisions impossible.

Patterns Noticed

  • Number drift correlates with user pushback. Initial answer is fast-and-rough; subsequent answers shift definitions to find a "better" number, instead of defending the first with stricter verification.
  • Memory rules are read on demand, not applied proactively before quantitative answers.
  • When asked to verify, Claude runs a new query rather than the same query twice, which guarantees the answer can drift.
  • Documentation/code constants are treated as data when actual data sources require effort to query.

Similar Behavior in Other Sessions

The user reports the same pattern in earlier sessions on the same project, leading to lost work and rework. The persistent memory file was created specifically to prevent this; it did not.

extent analysis

TL;DR

Review and adjust the Accept Edits permission mode to prevent unintended changes, and consider re-training or re-initializing the Claude model to improve its understanding and performance.

Guidance

  • Verify that the Accept Edits mode is set to a more restrictive setting to prevent Claude from making unintended changes to the sidebar.
  • Review the conversation history to identify potential miscommunications or misunderstandings that may have led to Claude's confusion.
  • Consider re-training or re-initializing the Claude model to improve its understanding of the task and prevent similar issues in the future.
  • Regularly commit changes to prevent loss of work in case Claude reverts or deletes sections of the sidebar.

Example

No code snippet is provided as the issue is related to the interaction with the Claude model and not a specific code problem.

Notes

The issue seems to be related to the interaction with the Claude model and its understanding of the task. The fact that the model was able to complete tasks correctly in the past but is now struggling suggests that there may be an issue with the model's training or initialization.

Recommendation

Apply workaround: Review and adjust the Accept Edits permission mode and consider re-training or re-initializing the Claude model to improve its performance and prevent similar issues in the future. This is because the issue is intermittent and related to the model's understanding, and adjusting the permission mode and re-training the model may help to mitigate the problem.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING