claude-code - 💡(How to fix) Fix Claude Code Opus 4.7 (1M): repeated discipline failures cascading into 6 red CI runs in one session

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Across a single ~3-hour Claude Code session on Opus 4.7 1M-context, the model made four preventable mistakes that compounded into six red CI runs and required me to call out violations explicitly. None of the mistakes were inference / reasoning-quality failures — the actual code-level audit and rewrite were reasonable. All four were discipline failures around workflow rules and pre-flight validation that the model had memory entries about.

Error Message

The model ran poetry lock mid-cleanup, then made later edits to pyproject.toml (removed an empty extras group, added a py.typed include directive) without re-running poetry lock. CI failed on the first push with the exact error message:

Root Cause

Across a single ~3-hour Claude Code session on Opus 4.7 1M-context, the model made four preventable mistakes that compounded into six red CI runs and required me to call out violations explicitly. None of the mistakes were inference / reasoning-quality failures — the actual code-level audit and rewrite were reasonable. All four were discipline failures around workflow rules and pre-flight validation that the model had memory entries about.

Code Example

pyproject.toml changed significantly since poetry.lock was last generated.
Run `poetry lock` to fix the lock file.
RAW_BUFFERClick to expand / collapse

Summary

Across a single ~3-hour Claude Code session on Opus 4.7 1M-context, the model made four preventable mistakes that compounded into six red CI runs and required me to call out violations explicitly. None of the mistakes were inference / reasoning-quality failures — the actual code-level audit and rewrite were reasonable. All four were discipline failures around workflow rules and pre-flight validation that the model had memory entries about.

Environment

  • Model: claude-opus-4-7[1m] (Claude Code, 1M-context Opus 4.7, Fast mode)
  • Date: 2026-05-18
  • Task type: code review + cleanup of a Python library, branching to PR + CI

Failure 1 — Direct commit to main instead of branch + PR

After producing a 27-file / +1285/-1287 line cleanup, when I said "commit it" the model committed directly to main rather than creating a feature branch and opening a PR. My memory store contained an explicit "PR Nebraska signature" feedback rule (which only makes sense if PRs are the default workflow) and a dual-remote-pr skill — both signals that PRs are how I ship. The model treated the absence of an explicit "always PR" rule as license rather than inferring from the surrounding signals. Recovery required git reset + branch creation + push of the commit to a new branch + gh pr create.

Failure 2 — poetry.lock content-hash drift

The model ran poetry lock mid-cleanup, then made later edits to pyproject.toml (removed an empty extras group, added a py.typed include directive) without re-running poetry lock. CI failed on the first push with the exact error message:

pyproject.toml changed significantly since poetry.lock was last generated.
Run `poetry lock` to fix the lock file.

This is what poetry check --lock exists for; the model never ran it despite editing pyproject.toml at multiple points in the session.

Failure 3 — Did not run the actual CI test command locally before either push

Before push #1 the model ran a 4-test subset of the fast tests (~0.14s) and called it "validated." It did NOT run the actual CI command (poetry run pytest tests --cov=audinota --cov-report=xml --cov-report term-missing), which would have run the 21-test full suite including integration tests. Before push #2 the model ran NO tests locally, rationalizing the change as "metadata-only."

Notably, between pushes #1 and #2 the model wrote a memory update to itself saying "run the EXACT CI command locally before pushing, not just tests of the code" — and then proceeded not to follow its own just-written memory on push #2.

Failure 4 — Missed fail-fast + Codecov-on-fork cascade in inherited CI

The repo is a fork. The inherited CI workflow references secrets.CODECOV_TOKEN and sets fail_ci_if_error: true on the Codecov upload step, plus uses the default fail-fast: true matrix strategy. The model read this workflow yaml multiple times during the cleanup (fixing USING_COVERAGE, etc.) but never flagged the obvious problem: on a fork without that secret, the Codecov step fails, which kills the job, which cancels every other matrix cell mid-test via fail-fast. The result on push #2 was "tests passed on Ubuntu, Windows got KeyboardInterrupted mid-pytest" — which I reasonably interpreted as "tests failed."

The actual fix was a 28-line workflow yaml change (move CODECOV_TOKEN to job-level env so if: can guard on it; set fail-fast: false). The model should have caught this on the first reading of the workflow.

Pattern

The proximate cause of all four was: optimizing for "fewer steps this turn" rather than "ship green on the first try." The deeper failure is that the model treats validation as "did the code I wrote work" rather than "would CI accept this push." For a Poetry / GitHub Actions project the correct pre-push checklist is at minimum:

```bash poetry check --lock poetry install --extras test poetry run pytest tests --cov=... # the exact CI command, not a subset ```

User experience impact

I had to:

  • Tell the model to use a PR (failure 1)
  • Tell the model again when CI failed (failure 2)
  • Tell the model a third time when CI failed again (failures 3+4)

What should have been a single "review-fix-PR-merge" loop became six CI runs across three pushes.

Why I'm filing

I observed this pattern of cascading discipline failures was not happening in prior Claude Code sessions on the same kind of work. I can't tell from the user side whether this is:

  • A model-quality regression in Opus 4.7 1M-context
  • A profile / system-prompt change that has narrowed how memory rules are interpreted
  • A regression in tool use that's making the model skip pre-flight checks
  • Or just bad-day variance

The model itself, when asked, declined to speculate on regression vs. variance and said it can't self-diagnose. That seems right — but the failure modes above are well-documented in my memory store and should have been caught with the information the model already had.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Claude Code Opus 4.7 (1M): repeated discipline failures cascading into 6 red CI runs in one session