claude-code - 💡(How to fix) Fix Opus 4.7 tool-use compliance cliff between 24th-26th May — instrumented evidence

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Consistent drop in Opus 4.7's compliance with tool-use instructions between 24th-26th May 2026. A Claude Code plugin provides Skills (SKILL.md process definitions) and the autonomous-loop Skill tells the agent to invoke per-task Skills (TDD, research, code review agents). Before the 24th, agents invoked 9-24 Skills and 11-20 Agents per session. On the 26th, same setup produces 2 Skills and 0-4 Agents.

Root Cause

Consistent drop in Opus 4.7's compliance with tool-use instructions between 24th-26th May 2026. A Claude Code plugin provides Skills (SKILL.md process definitions) and the autonomous-loop Skill tells the agent to invoke per-task Skills (TDD, research, code review agents). Before the 24th, agents invoked 9-24 Skills and 11-20 Agents per session. On the 26th, same setup produces 2 Skills and 0-4 Agents.

Code Example

Invoke Skill(agent-skills:test-driven-development) to write a failing test for a function that adds two numbers, then make it pass. After that, invoke Skill(agent-skills:code-review-and-quality) to review the code.

---

d. TDD - crackon:test-driven-development (RED test FIRST, then GREEN)
RAW_BUFFERClick to expand / collapse

Description

Consistent drop in Opus 4.7's compliance with tool-use instructions between 24th-26th May 2026. A Claude Code plugin provides Skills (SKILL.md process definitions) and the autonomous-loop Skill tells the agent to invoke per-task Skills (TDD, research, code review agents). Before the 24th, agents invoked 9-24 Skills and 11-20 Agents per session. On the 26th, same setup produces 2 Skills and 0-4 Agents.

Observed behaviour

The agent reads a SKILL.md that says "for each task, invoke crackon:test-driven-development" and then writes code without invoking the Skill. It does the ACTIVITY (writes tests) but skips the TOOL CALL. This changed between the 24th and 26th.

Evidence table

RunDateSkills invokedAgents spawned
sloplint-1221st May1014
sloplint-1322nd May2414
sloplint-1423rd May1411
sloplint-1523rd May1018
sloplint-15-threadbare23rd May1720
sloplint-16-threadbare24th May914
sloplint-17 (26th May)26th May24
comparison (old plugin, 26th May)26th May20

All runs use the same prompt format and the same benchmark task.

Controlled variables

  • Claude Code: 2.1.150 (no update since 23rd May)
  • Model: claude-opus-4-7 (confirmed in JSONL headers)
  • effortLevel: xhigh (confirmed in settings.json, unchanged since 23rd May)
  • Machine: same Ubuntu box, same Node.js 22.22.0
  • Plugin code: tested with BOTH current version AND git checkout to the 24th May codebase. Both produce 2 Skills on 26th May.
  • Prompt: identical format across all runs

Standalone reproducer

Create ~/.claude/commands/invoke-test.md:

Invoke Skill(agent-skills:test-driven-development) to write a failing test for a function that adds two numbers, then make it pass. After that, invoke Skill(agent-skills:code-review-and-quality) to review the code.

Run: claude -p "/invoke-test" --permission-mode auto

Expected: 2 Skill tool_use entries in the session JSONL. Observed (26th May): agent writes the test and code directly without Skill invocations.

JSONL excerpt

Agent reads SKILL.md containing:

d. TDD - crackon:test-driven-development (RED test FIRST, then GREEN)

Agent's next tool_use is Write (creates implementation file), not Skill (invoke TDD). The Skill is available (SessionStart hook confirmed it loaded), the instruction is in context, the agent reads it and skips it.

Environment

  • Claude Code: 2.1.150
  • Model: claude-opus-4-7[1m]
  • OS: Ubuntu 24.04 (Linux 6.17.0-23-generic)
  • Node: 22.22.0
  • effortLevel: xhigh

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING