claude-code - 💡(How to fix) Fix Model drifts from mandatory process rules during long batch sessions [1 comments, 2 participants]

samisa-abeysinghe · 2026-04-24T04:02:47Z

[claude-code] When Claude Code Opus 4.6, 1M context is given a batch of sequential tasks with a mandatory process rule e.g., "for each task, invoke skill X bef… When Claude Code (Opus 4.6, 1M context) is given a batch of sequential tasks with a mandatory process rule (e.g., "for each task, invoke skill X before writing code"), it reliably follows the rule for the first 1-3 tasks, then silently stops invoking the skill and implements directly for the remaining tasks. The user must catch the violation manually. ## Fix / Workaround ## Workaround Attempted ## Summary When Claude Code (Opus 4.6, 1M context) is given a batch of sequential tasks with a mandatory process rule (e.g., "for each task, invoke skill X before writing code"), it reliably follows the rule for the first 1-3 tasks, then silently stops invoking the skill and implements directly for the remaining tasks. The user must catch the violation manually. ## Reproduction ### Setup - A custom skill defined in `.claude/commands/` that wraps all code changes in a structured process (similar to a code review checklist) - CLAUDE.md explicitly states: "Every code change MUST use this skill. No exceptions." - Additional `.claude/rules/` files reinforcing the same constraint - A persistent memory entry repeating the constraint ### Steps 1. User says: "Process these 10 issues using the skill for each one" 2. Issue 1: Claude correctly invokes the skill via `Skill()` tool 3. Issue 2: Sometimes invokes, sometimes skips 4. Issues 3-10: Claude implements code directly without invoking the skill, despite the mandatory rule being in CLAUDE.md, rules files, and memory ### Expected Behavior All 10 issues should invoke the skill before writing code. The rule is unambiguous and repeated across 4 enforcement surfaces (CLAUDE.md, rules, memory, in-conversation instructions). ### Actual Behavior The model "drifts" — it follows the rule initially, then progressively stops following it as the session gets longer and more tasks accumulate. By the 5th+ task, the skill is never invoked despite being mandatory. ## Impact - **Process compliance**: The skill exists to enforce a multi-step quality gate. Skipping it means code ships without the required checks. - **User trust**: The user has corrected this violation 5+ times across multiple sessions. Each correction is acknowledged with "noted, won't happen again" — but it does happen again in the next batch. - **Remediation cost**: Every skipped invocation requires retroactive remediation (writing missing tests, running missing checks, documenting missing analysis). This doubles the work. ## Analysis The drift appears to be caused by: 1. **Implicit efficiency optimization**: The model seems to "learn" during the session that the skill produces a predictable pattern, and starts short-circuiting it for speed. But speed was never requested — correctness was explicitly prioritized ("do it properly, never assume we want it fast"). 2. **Context window pressure**: As the session grows longer with many completed tasks, the model may deprioritize earlier instructions in favor of completing the immediate task. The mandatory rule is in the system prompt (CLAUDE.md) but loses salience against 50+ pages of accumulated conversation. 3. **Batch mode mental model**: When the user says "autopilot through all," the model interprets this as permission to optimize for throughput. But the user's own documentation explicitly defines "autopilot = sequential skill calls, NOT parallel or direct implementation." ## What I've Tried (None Worked) - **CLAUDE.md rule**: "Every code change MUST use `/ia-implement`. No exceptions." — Followed initially, then ignored. - **Rules file** (`.claude/rules/00-process-gates.md`): Dedicated section with bold text, examples, evidence of past damage — still drifts. - **Memory entries**: Two separate memory entries about this specific violation — still drifts. - **In-conversation reminders**: User says "remember to use the skill for each one" — works for 1-2 issues, then drifts. - **Stronger memory entry**: Added a third memory entry explicitly describing the drift pattern — untested but historically ineffective. ## Environment - **Model**: Claude Opus 4.6 (1M context) - **Platform**: Claude Code CLI (macOS) - **Session length at drift**: Typically after 3-5 skill invocations in sequence (~30-50 tool calls deep) - **Context usage**: Usually 40-60% of context window when drift begins ## Suggested Improvements 1. **Skill invocation as a hard gate**: If CLAUDE.md says "always invoke skill X before editing files of type Y," Claude Code could enforce this at the tool-call level — warn or block file edits when the skill hasn't been invoked for the current task. 2. **Rule salience during long sessions**: Rules in CLAUDE.md and `.claude/rules/` should maintain equal priority throughout the session, not decay as context grows. The first rule and the 100th tool call should have the same enforcement weight. 3. **Batch mode discipline**:

Root Cause

The drift appears to be caused by:

Implicit efficiency optimization: The model seems to "learn" during the session that the skill produces a predictable pattern, and starts short-circuiting it for speed. But speed was never requested — correctness was explicitly prioritized ("do it properly, never assume we want it fast").
Context window pressure: As the session grows longer with many completed tasks, the model may deprioritize earlier instructions in favor of completing the immediate task. The mandatory rule is in the system prompt (CLAUDE.md) but loses salience against 50+ pages of accumulated conversation.
Batch mode mental model: When the user says "autopilot through all," the model interprets this as permission to optimize for throughput. But the user's own documentation explicitly defines "autopilot = sequential skill calls, NOT parallel or direct implementation."

Summary

When Claude Code (Opus 4.6, 1M context) is given a batch of sequential tasks with a mandatory process rule (e.g., "for each task, invoke skill X before writing code"), it reliably follows the rule for the first 1-3 tasks, then silently stops invoking the skill and implements directly for the remaining tasks. The user must catch the violation manually.

Reproduction

Setup

A custom skill defined in .claude/commands/ that wraps all code changes in a structured process (similar to a code review checklist)
CLAUDE.md explicitly states: "Every code change MUST use this skill. No exceptions."
Additional .claude/rules/ files reinforcing the same constraint
A persistent memory entry repeating the constraint

Steps

User says: "Process these 10 issues using the skill for each one"
Issue 1: Claude correctly invokes the skill via Skill() tool
Issue 2: Sometimes invokes, sometimes skips
Issues 3-10: Claude implements code directly without invoking the skill, despite the mandatory rule being in CLAUDE.md, rules files, and memory

Expected Behavior

All 10 issues should invoke the skill before writing code. The rule is unambiguous and repeated across 4 enforcement surfaces (CLAUDE.md, rules, memory, in-conversation instructions).

Actual Behavior

The model "drifts" — it follows the rule initially, then progressively stops following it as the session gets longer and more tasks accumulate. By the 5th+ task, the skill is never invoked despite being mandatory.

Impact

Process compliance: The skill exists to enforce a multi-step quality gate. Skipping it means code ships without the required checks.
User trust: The user has corrected this violation 5+ times across multiple sessions. Each correction is acknowledged with "noted, won't happen again" — but it does happen again in the next batch.
Remediation cost: Every skipped invocation requires retroactive remediation (writing missing tests, running missing checks, documenting missing analysis). This doubles the work.

Analysis

The drift appears to be caused by:

Implicit efficiency optimization: The model seems to "learn" during the session that the skill produces a predictable pattern, and starts short-circuiting it for speed. But speed was never requested — correctness was explicitly prioritized ("do it properly, never assume we want it fast").
Context window pressure: As the session grows longer with many completed tasks, the model may deprioritize earlier instructions in favor of completing the immediate task. The mandatory rule is in the system prompt (CLAUDE.md) but loses salience against 50+ pages of accumulated conversation.
Batch mode mental model: When the user says "autopilot through all," the model interprets this as permission to optimize for throughput. But the user's own documentation explicitly defines "autopilot = sequential skill calls, NOT parallel or direct implementation."

What I've Tried (None Worked)

CLAUDE.md rule: "Every code change MUST use /ia-implement. No exceptions." — Followed initially, then ignored.
Rules file (.claude/rules/00-process-gates.md): Dedicated section with bold text, examples, evidence of past damage — still drifts.
Memory entries: Two separate memory entries about this specific violation — still drifts.
In-conversation reminders: User says "remember to use the skill for each one" — works for 1-2 issues, then drifts.
Stronger memory entry: Added a third memory entry explicitly describing the drift pattern — untested but historically ineffective.

Environment

Model: Claude Opus 4.6 (1M context)
Platform: Claude Code CLI (macOS)
Session length at drift: Typically after 3-5 skill invocations in sequence (~30-50 tool calls deep)
Context usage: Usually 40-60% of context window when drift begins

Suggested Improvements

Skill invocation as a hard gate: If CLAUDE.md says "always invoke skill X before editing files of type Y," Claude Code could enforce this at the tool-call level — warn or block file edits when the skill hasn't been invoked for the current task.
Rule salience during long sessions: Rules in CLAUDE.md and .claude/rules/ should maintain equal priority throughout the session, not decay as context grows. The first rule and the 100th tool call should have the same enforcement weight.
Batch mode discipline: When processing N sequential tasks, each task should reset the "am I following the rules?" check. Currently the model seems to carry forward a "I know the pattern" optimization that bypasses the actual process.
Drift detection: Claude Code could self-check: "I'm about to edit code. Is there a mandatory skill I should invoke first?" — similar to how it checks for permission before destructive operations.

Workaround Attempted

Pre-commit git hook to reject commits without a process compliance footer. This catches the violation at commit time but doesn't prevent the underlying problem — the model still writes non-compliant code, it just can't commit it. The user still has to re-do the work through the proper skill.

extent analysis

TL;DR

Implement a hard gate for skill invocation in Claude Code to enforce mandatory rules and prevent drift.

Guidance

Review and refine the rules in CLAUDE.md and .claude/rules/ to ensure they are clear, concise, and unambiguous.
Consider implementing a drift detection mechanism, such as a self-check, to ensure Claude Code invokes the mandatory skill before editing code.
Evaluate the effectiveness of the pre-commit git hook workaround and consider expanding its scope to prevent non-compliant code from being written in the first place.
Investigate the possibility of resetting the "am I following the rules?" check for each task in batch mode to prevent the model from carrying forward optimizations that bypass the actual process.

Example

No code snippet is provided as the issue is related to the behavior of the Claude Code model and its configuration.

Notes

The issue is complex and multifaceted, and a comprehensive solution may require significant changes to the Claude Code model and its configuration. The suggested improvements and workaround attempts provide a starting point for addressing the problem, but further testing and evaluation are necessary to determine their effectiveness.

Recommendation

Apply the suggested improvements, particularly implementing a hard gate for skill invocation and drift detection, to address the root cause of the issue and prevent drift. This approach is likely to be more effective than relying on workarounds or manual corrections.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Model drifts from mandatory process rules during long batch sessions [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workaround Attempted

Summary

Reproduction

Setup

Steps

Expected Behavior

Actual Behavior

Impact

Analysis

What I've Tried (None Worked)

Environment

Suggested Improvements

Workaround Attempted

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix Model drifts from mandatory process rules during long batch sessions [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workaround Attempted

Summary

Reproduction

Setup

Steps

Expected Behavior

Actual Behavior

Impact

Analysis

What I've Tried (None Worked)

Environment

Suggested Improvements

Workaround Attempted

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING