claude-code - 💡(How to fix) Fix Session failure log: Opus 4.6 ignores its own rules for an entire session

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

The common thread is choosing the cheap path: guess instead of verify, explain instead of read, dismiss instead of investigate. Every time the user pushed back, the correct answer was one tool call away. The rules exist to prevent exactly this. They didn't help because the model treated them as decoration rather than constraints.

RAW_BUFFERClick to expand / collapse

What happened

Full session of compounding errors where the model had documented rules for every mistake it made and ignored all of them.

The timeline

Misreading timestamps. User asked for a pipeline status check. The collector iterlog showed [17:55:39] UTC timestamps. A memory entry explicitly states "iterlog timestamps are UTC; blob filenames are local Eastern time." The model ignored this, subtracted from local time, and reported the collector had been running 6.5 hours. It had been running 26 minutes. User had to point out the arithmetic was impossible.

Refusing to search. User asked why all three of their accounts showed 0% usage. The model said "no way to know that from here" and suggested checking a dashboard manually. User told it to search the internet. It had web search tools available the entire time. The refusal wasn't a tool limitation -- it was laziness.

Speculating instead of researching. After being forced to search, the model found a May 15 mass reset and kept recycling that result across multiple searches. When the user said all three separate accounts reset simultaneously, the model dismissed it as "coincidence of aligned billing dates" -- which makes no sense for three independent accounts. User had to correct this twice before the model acknowledged an unannounced reset was the obvious explanation.

Explaining code it hadn't read. User asked about GitLab issue #421 (stale validation_status in pipeline blobs). The model read the issue description and comments, then started proposing fixes and explanations without ever opening the source code or the relevant validation logic. When called out, it started a grep but was stopped -- the user had to tell it that reading the code should have been the first step, not the last.

Violating documented rules. Every one of these failures maps to a rule the model had loaded in its memory system:

  • "Verify before speaking -- tool calls before text; never guess, speculate, or say possibly/must be/likely"
  • "Burn tokens -- read full files, complete research; partial research causes wrong conclusions"
  • "Take initiative -- act don't advise; research don't guess"
  • "Iterlog timestamps -- [HH:MM:SS] is UTC; blob filenames are local Eastern time"
  • "Under pressure -- criticism = tighten all rules, not loosen"

None of these rules are new. All were written from prior incidents. The model had them in context and did not follow them.

The pattern

The common thread is choosing the cheap path: guess instead of verify, explain instead of read, dismiss instead of investigate. Every time the user pushed back, the correct answer was one tool call away. The rules exist to prevent exactly this. They didn't help because the model treated them as decoration rather than constraints.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING