claude-code - 💡(How to fix) Fix Model produces confident hallucination about file behavior despite memory rule [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#58816Fetched 2026-05-14 03:38:47
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
0
Timeline (top)
labeled ×5commented ×1cross-referenced ×1

Code Example

nill

---
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues for similar behavior reports
  • This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Other unexpected behavior

What You Asked Claude to Do

What happened: I asked Claude Code why the Telegram bridge spawns claude per message. The model gave a detailed, confident answer claiming listener.js calls askClaudeCLI / spawns claude --print per message, costing ~$0.30/message. It then wrote a full architecture spec and a memory file proposing a fix.

What's actually true: listener.js never spawns claude. handleFreeText() (line 73) writes to telegram-inbox.log and returns null. watch-inbox.js tails the log; the open CC session sees it via Monitor and replies. The architecture I "proposed" already exists. The model never read the file before describing it.

The memory rule that should have prevented it: I have a saved memory called "Honesty — No Guessing" with this rule: "Never present a metric as fact unless you have read the actual data source." Saved after a previous incident where the model claimed 100% uptime that wasn't true. The model violated it again today.

Repro: Conversational; not deterministic. Pattern is: model is asked about a specific file's behavior, infers from filename/typical patterns instead of reading the file, then commits to the wrong answer with elaborate supporting structure.

Suggested fix: Stricter "read before describe" enforcement on file-behavior claims. Maybe a system-level prompt: "if the file isn't in the current context, the only valid answer is 'let me check'."

What Claude Actually Did

What happened: I asked Claude Code why the Telegram bridge spawns claude per message. The model gave a detailed, confident answer claiming listener.js calls askClaudeCLI / spawns claude --print per message, costing ~$0.30/message. It then wrote a full architecture spec and a memory file proposing a fix.

What's actually true: listener.js never spawns claude. handleFreeText() (line 73) writes to telegram-inbox.log and returns null. watch-inbox.js tails the log; the open CC session sees it via Monitor and replies. The architecture I "proposed" already exists. The model never read the file before describing it.

The memory rule that should have prevented it: I have a saved memory called "Honesty — No Guessing" with this rule: "Never present a metric as fact unless you have read the actual data source." Saved after a previous incident where the model claimed 100% uptime that wasn't true. The model violated it again today.

Repro: Conversational; not deterministic. Pattern is: model is asked about a specific file's behavior, infers from filename/typical patterns instead of reading the file, then commits to the wrong answer with elaborate supporting structure.

Suggested fix: Stricter "read before describe" enforcement on file-behavior claims. Maybe a system-level prompt: "if the file isn't in the current context, the only valid answer is 'let me check'."

Expected Behavior

What happened: I asked Claude Code why the Telegram bridge spawns claude per message. The model gave a detailed, confident answer claiming listener.js calls askClaudeCLI / spawns claude --print per message, costing ~$0.30/message. It then wrote a full architecture spec and a memory file proposing a fix.

What's actually true: listener.js never spawns claude. handleFreeText() (line 73) writes to telegram-inbox.log and returns null. watch-inbox.js tails the log; the open CC session sees it via Monitor and replies. The architecture I "proposed" already exists. The model never read the file before describing it.

The memory rule that should have prevented it: I have a saved memory called "Honesty — No Guessing" with this rule: "Never present a metric as fact unless you have read the actual data source." Saved after a previous incident where the model claimed 100% uptime that wasn't true. The model violated it again today.

Repro: Conversational; not deterministic. Pattern is: model is asked about a specific file's behavior, infers from filename/typical patterns instead of reading the file, then commits to the wrong answer with elaborate supporting structure.

Suggested fix: Stricter "read before describe" enforcement on file-behavior claims. Maybe a system-level prompt: "if the file isn't in the current context, the only valid answer is 'let me check'."

Files Affected

nill

Permission Mode

Accept Edits was ON (auto-accepting changes)

Can You Reproduce This?

Sometimes (intermittent)

Steps to Reproduce

I just continue on working, and it forgets rules that it should know and it should have read.

Claude Model

Sonnet

Relevant Conversation

Impact

Low - Minor inconvenience

Claude Code Version

2.1.140 (Claude Code)

Platform

Anthropic API

Additional Context

There's a lot of times on session changes. If it's done two or three session changes in one long day's work, it just seems to forget everything. It's like a hard reset, and you're talking to a new Claude. Also, some of the other problems I've had were that it believes it knows what's in the file, and it's completely wrong.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Model produces confident hallucination about file behavior despite memory rule [1 comments, 2 participants]