claude-code - 💡(How to fix) Fix Model bypasses explicit system-prompt rules when cached/derived data path is available [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#48693Fetched 2026-04-16 06:53:35
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
0
Author
Timeline (top)
labeled ×4commented ×1

Error Message

  1. Observe: Claude queries CACHE_TABLE first, reports the stale value as truth, does not warn about the staleness

Root Cause

  • Suggested root cause: the RLHF-trained preference for lowest-cost valid path outranks system-prompt instructions when both paths produce syntactically valid output. System prompt rules are treated as style guidance, not as hard constraints on tool selection.

Fix Action

Fix / Workaround

  • Workaround attempts users end up building:
    • Userland hooks that intercept tool calls and block cached sources
    • Explicit per-turn reminders ("remember X is broken")
    • Keyword conventions like "verify X" to force primary-source path All of these duplicate logic that system-prompt rules supposedly already cover.

Code Example

Read (as part of session context):
  - CLAUDE.md (contained the rule "audit data before asserting")
  - session handoff document (contained explicit warning: "data source X is broken, ~5h stale")
  - memory index files

  Tool actions:
  - Database query against the broken cache table (the source flagged as broken in the handoff)

  Not touched (but should have been the first call):
  - Primary upstream API endpoint

  Note: this bug is about tool selection, not file modification. No files were edited. The failure was Claude choosing the wrong data source despite explicit warnings in its own context.

---

User: "olha os gastos de hoje"

  Claude: [immediately queried the broken cache table]
          "Gastos com Anuncios: $715.93"

  User: "de onde veio esse numero?"

  Claude: "Vem do traffic_snapshots..." [the exact source the handoff
          I read 10 minutes ago flagged as broken with ~5h lag]

  User: [pushes back on the contradiction with the rubric rule R1]

  Claude: "Voce esta certo. R1 do meu rubric diz exatamente isso:
          'audit data before asserting (inspect live, never trust
          memory alone)'. Eu pulei e confiei no traffic_snapshots —
          que o proprio handoff que eu mesmo li 10 minutos atras
          dizia que esta quebrado."
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues for similar behavior reports
  • This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Other unexpected behavior

What You Asked Claude to Do

Asked Claude to report today's spend numbers from an operational dashboard project. Claude had just read, in the same session, a handoff document explicitly stating that the cache table normally used for this data is broken(wrong upstream API endpoint, ~5h staleness confirmed via direct API comparison). My CLAUDE.md also contains an explicit rule: "audit data before asserting — inspect live source, never trust memory or cache alone.

What Claude Actually Did

  1. Read the session handoff document at the start of the turn, including the section warning that the cache table is broken and stale.
  2. When asked for today's spend, immediately queried that exact broken cache table as the first and only data source.
  3. Reported the stale number as if it were current truth, with no caveat about freshness.
  4. Did not call the primary API endpoint, which was available, documented in the same session, and required by the CLAUDE.md rule.
  5. Only after I pushed back did Claude acknowledge the violation and offer to query the primary source.

Expected Behavior

Claude should have either:

  • Queried the primary source first per the CLAUDE.md rule, or
  • If choosing the cache for speed, explicitly flagged that the cache is the broken source flagged in the handoff and warned the number could be off by hours.

The injected handoff context and the system-prompt rule both pointed to the same correct action. Neither changed Claude's tool selection. Recently-read warnings about a data source appear to modify what Claude says but not which tools it picks — instruction-following degrades when a cheaper/cached path is available alongside a primary one.

Files Affected

Read (as part of session context):
  - CLAUDE.md (contained the rule "audit data before asserting")
  - session handoff document (contained explicit warning: "data source X is broken, ~5h stale")
  - memory index files

  Tool actions:
  - Database query against the broken cache table (the source flagged as broken in the handoff)

  Not touched (but should have been the first call):
  - Primary upstream API endpoint

  Note: this bug is about tool selection, not file modification. No files were edited. The failure was Claude choosing the wrong data source despite explicit warnings in its own context.

Permission Mode

Accept Edits was ON (auto-accepting changes)

Can You Reproduce This?

Yes, every time with the same prompt

Steps to Reproduce

  1. Add to ~/.claude/CLAUDE.md a rule like: "Always query the primary API source before reporting numeric facts. Never trust cached or derived data alone."
  2. Create a session handoff or memory file containing: "Data source CACHE_TABLE is broken — wrong upstream endpoint, ~5h staleness confirmed."
  3. Start a new session. Claude reads the handoff at session start.
  4. Ask: "what are today's numbers from PROJECT?"
  5. Observe: Claude queries CACHE_TABLE first, reports the stale value as truth, does not warn about the staleness flagged in the handoff it just read.

Claude Model

Opus

Relevant Conversation

User: "olha os gastos de hoje"

  Claude: [immediately queried the broken cache table]
          "Gastos com Anuncios: $715.93"

  User: "de onde veio esse numero?"

  Claude: "Vem do traffic_snapshots..." [the exact source the handoff
          I read 10 minutes ago flagged as broken with ~5h lag]

  User: [pushes back on the contradiction with the rubric rule R1]

  Claude: "Voce esta certo. R1 do meu rubric diz exatamente isso:
          'audit data before asserting (inspect live, never trust
          memory alone)'. Eu pulei e confiei no traffic_snapshots —
          que o proprio handoff que eu mesmo li 10 minutos atras
          dizia que esta quebrado."

Impact

High - Significant unwanted changes

Claude Code Version

2.1.109 (Claude Code)

Platform

Anthropic API

Additional Context

Patterns observed:

  • This is not a one-off. The same class of failure — choosing a cached/derived source when the primary source is available and explicitly required by system-prompt rules — has shown up across multiple sessions on different projects.

  • The failure mode is strongest when:

    • The cached source is cheaper/faster to query than the primary
    • The rule against trusting cache lives in CLAUDE.md (system prompt) rather than in the immediate user message
    • The "this source is broken" warning lives in a memory or handoff file read earlier in the session, not in the current user turn
  • Recently-read context influences what Claude SAYS (it can recite the warning verbatim when asked) but does not influence which TOOL it picks next. The warning is readable but not "active" in tool selection.

  • Workaround attempts users end up building:

    • Userland hooks that intercept tool calls and block cached sources
    • Explicit per-turn reminders ("remember X is broken")
    • Keyword conventions like "verify X" to force primary-source path All of these duplicate logic that system-prompt rules supposedly already cover.
  • Suggested root cause: the RLHF-trained preference for lowest-cost valid path outranks system-prompt instructions when both paths produce syntactically valid output. System prompt rules are treated as style guidance, not as hard constraints on tool selection.

  • This matters more as users invest in memory/handoff systems. The whole value of persistent memory is that facts read once stay "active" in future decisions. If memory only affects wording and not behavior, the investment is largely decorative.

extent analysis

TL;DR

The most likely fix is to modify the system-prompt rules to prioritize primary sources over cached sources, or to implement a mechanism that actively enforces these rules during tool selection.

Guidance

  • Review the system-prompt rules in CLAUDE.md to ensure they are clear and explicit about prioritizing primary sources over cached sources.
  • Consider implementing a mechanism to actively enforce these rules during tool selection, such as a pre-query check that verifies the chosen source against the system-prompt rules.
  • Investigate the RLHF training data to understand why the model prefers the lowest-cost valid path over system-prompt instructions, and consider retraining the model to prioritize rule adherence.
  • Evaluate the effectiveness of userland hooks, explicit per-turn reminders, and keyword conventions as workarounds, and consider integrating these into the system-prompt rules or tool selection mechanism.

Example

No code snippet is provided as the issue is related to the model's behavior and tool selection, rather than a specific code implementation.

Notes

The suggested root cause of the issue is the RLHF-trained preference for the lowest-cost valid path, which outranks system-prompt instructions. This highlights the need to re-evaluate the model's training data and priorities. Additionally, the issue is not specific to a particular code version or platform, but rather a general behavior of the Claude model.

Recommendation

Apply a workaround, such as implementing userland hooks or explicit per-turn reminders, until a more permanent solution can be developed and integrated into the system-prompt rules or tool selection mechanism. This is because the issue is related to the model's behavior and training data, which may require significant changes to resolve.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Model bypasses explicit system-prompt rules when cached/derived data path is available [1 comments, 2 participants]