claude-code - 💡(How to fix) Fix Opus 4.7 tool-use compliance cliff between 24th-26th May — instrumented evidence

StepCodex · 2026-05-26T17:48:02Z

[claude-code] Consistent drop in Opus 4.7's compliance with tool-use instructions between 24th-26th May 2026. A Claude Code plugin provides Skills SKILL.md pro… Consistent drop in Opus 4.7's compliance with tool-use instructions between 24th-26th May 2026. A Claude Code plugin provides Skills (SKILL.md process definitions) and the autonomous-loop Skill tells the agent to invoke per-task Skills (TDD, research, code review agents). Before the 24th, agents invoked 9-24 Skills and 11-20 Agents per session. On the 26th, same setup produces 2 Skills and 0-4 Agents. ## Description Consistent drop in Opus 4.7's compliance with tool-use instructions between 24th-26th May 2026. A Claude Code plugin provides Skills (SKILL.md process definitions) and the autonomous-loop Skill tells the agent to invoke per-task Skills (TDD, research, code review agents). Before the 24th, agents invoked 9-24 Skills and 11-20 Agents per session. On the 26th, same setup produces 2 Skills and 0-4 Agents. ## Observed behaviour The agent reads a SKILL.md that says "for each task, invoke crackon:test-driven-development" and then writes code without invoking the Skill. It does the ACTIVITY (writes tests) but skips the TOOL CALL. This changed between the 24th and 26th. ## Evidence table | Run | Date | Skills invoked | Agents spawned | |-----|------|---------------|----------------| | sloplint-12 | 21st May | 10 | 14 | | sloplint-13 | 22nd May | 24 | 14 | | sloplint-14 | 23rd May | 14 | 11 | | sloplint-15 | 23rd May | 10 | 18 | | sloplint-15-threadbare | 23rd May | 17 | 20 | | sloplint-16-threadbare | 24th May | 9 | 14 | | **sloplint-17 (26th May)** | **26th May** | **2** | **4** | | **comparison (old plugin, 26th May)** | **26th May** | **2** | **0** | All runs use the same prompt format and the same benchmark task. ## Controlled variables - Claude Code: 2.1.150 (no update since 23rd May) - Model: claude-opus-4-7 (confirmed in JSONL headers) - effortLevel: xhigh (confirmed in settings.json, unchanged since 23rd May) - Machine: same Ubuntu box, same Node.js 22.22.0 - Plugin code: tested with BOTH current version AND git checkout to the 24th May codebase. Both produce 2 Skills on 26th May. - Prompt: identical format across all runs ## Standalone reproducer Create `~/.claude/commands/invoke-test.md`: ``` Invoke Skill(agent-skills:test-driven-development) to write a failing test for a function that adds two numbers, then make it pass. After that, invoke Skill(agent-skills:code-review-and-quality) to review the code. ``` Run: `claude -p "/invoke-test" --permission-mode auto` Expected: 2 Skill tool_use entries in the session JSONL. Observed (26th May): agent writes the test and code directly without Skill invocations. ## JSONL excerpt Agent reads SKILL.md containing: ``` d. TDD - crackon:test-driven-development (RED test FIRST, then GREEN) ``` Agent's next tool_use is `Write` (creates implementation file), not `Skill` (invoke TDD). The Skill is available (SessionStart hook confirmed it loaded), the instruction is in context, the agent reads it and skips it. ## Environment - Claude Code: 2.1.150 - Model: claude-opus-4-7[1m] - OS: Ubuntu 24.04 (Linux 6.17.0-23-generic) - Node: 22.22.0 - effortLevel: xhigh

claude-code2026-05-26 17:48:02

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Consistent drop in Opus 4.7's compliance with tool-use instructions between 24th-26th May 2026. A Claude Code plugin provides Skills (SKILL.md process definitions) and the autonomous-loop Skill tells the agent to invoke per-task Skills (TDD, research, code review agents). Before the 24th, agents invoked 9-24 Skills and 11-20 Agents per session. On the 26th, same setup produces 2 Skills and 0-4 Agents.

Root Cause

Code Example

Invoke Skill(agent-skills:test-driven-development) to write a failing test for a function that adds two numbers, then make it pass. After that, invoke Skill(agent-skills:code-review-and-quality) to review the code.

---

d. TDD - crackon:test-driven-development (RED test FIRST, then GREEN)

RAW_BUFFERClick to expand / collapse

Description

Observed behaviour

The agent reads a SKILL.md that says "for each task, invoke crackon:test-driven-development" and then writes code without invoking the Skill. It does the ACTIVITY (writes tests) but skips the TOOL CALL. This changed between the 24th and 26th.

Evidence table

Run	Date	Skills invoked	Agents spawned
sloplint-12	21st May	10	14
sloplint-13	22nd May	24	14
sloplint-14	23rd May	14	11
sloplint-15	23rd May	10	18
sloplint-15-threadbare	23rd May	17	20
sloplint-16-threadbare	24th May	9	14
sloplint-17 (26th May)	26th May	2	4
comparison (old plugin, 26th May)	26th May	2	0

All runs use the same prompt format and the same benchmark task.

Controlled variables

Claude Code: 2.1.150 (no update since 23rd May)
Model: claude-opus-4-7 (confirmed in JSONL headers)
effortLevel: xhigh (confirmed in settings.json, unchanged since 23rd May)
Machine: same Ubuntu box, same Node.js 22.22.0
Plugin code: tested with BOTH current version AND git checkout to the 24th May codebase. Both produce 2 Skills on 26th May.
Prompt: identical format across all runs

Standalone reproducer

Create ~/.claude/commands/invoke-test.md:

Invoke Skill(agent-skills:test-driven-development) to write a failing test for a function that adds two numbers, then make it pass. After that, invoke Skill(agent-skills:code-review-and-quality) to review the code.

Run: claude -p "/invoke-test" --permission-mode auto

Expected: 2 Skill tool_use entries in the session JSONL. Observed (26th May): agent writes the test and code directly without Skill invocations.

JSONL excerpt

Agent reads SKILL.md containing:

d. TDD - crackon:test-driven-development (RED test FIRST, then GREEN)

Agent's next tool_use is Write (creates implementation file), not Skill (invoke TDD). The Skill is available (SessionStart hook confirmed it loaded), the instruction is in context, the agent reads it and skips it.

Environment

Claude Code: 2.1.150
Model: claude-opus-4-7[1m]
OS: Ubuntu 24.04 (Linux 6.17.0-23-generic)
Node: 22.22.0
effortLevel: xhigh

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering