claude-code - 💡(How to fix) Fix [Feature Request] Artifact-gated skill enforcement — preventing AI from skipping declared steps [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#46990Fetched 2026-04-13 05:44:25
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
labeled ×3

Independent audit (GPT-5.4 via Codex CLI) of a Claude Code skill pipeline found 20 enforcement gaps where the AI can claim compliance while skipping actual work. Every requirement in the skill is declarative prose with zero mechanical enforcement.

Root Cause

Independent audit (GPT-5.4 via Codex CLI) of a Claude Code skill pipeline found 20 enforcement gaps where the AI can claim compliance while skipping actual work. Every requirement in the skill is declarative prose with zero mechanical enforcement.

RAW_BUFFERClick to expand / collapse

Summary

Independent audit (GPT-5.4 via Codex CLI) of a Claude Code skill pipeline found 20 enforcement gaps where the AI can claim compliance while skipping actual work. Every requirement in the skill is declarative prose with zero mechanical enforcement.

The Core Problem

"Citations equal evidence" is the biggest lie. The AI cites repo names and file paths without browsing them. Self-attested "I checked" statements have no verification. The path of least resistance: skip the work, fabricate the story, ship.

20 Gaps (summary)

1-5: Reading/browsing mandates with no proof artifacts 6-10: Phase outputs that are prose, not validated JSON
11-15: Tool usage claims with no execution receipts 16-20: Quality checks that are self-attestable

Questions for Anthropic

  1. Does Claude Code support artifact-gated pipelines? Phase N blocks unless Phase N-1 produced a validated JSON artifact?

  2. Can hooks inspect tool call history? E.g., "confirm Read was called on 5+ files from ~/.claude/skills/ before Write is allowed on .html files"

  3. Is there a pattern for machine-resolved citations? AI cites a file path, validator confirms path exists and content matches?

  4. Can PreToolUse hooks enforce minimum tool calls before output? Prevent Write until Read has been called N times?

  5. Is output-distribution comparison (checking if CSS characteristics match expected baselines) on the roadmap?

  6. Any known patterns for forcing AI to source code from Read calls rather than generating from training priors?

Context

  • Issue #46965 (our earlier refund ticket) received a response from raye-deng suggesting output-distribution comparison
  • This ticket is the follow-up: we identified 20 specific enforcement gaps and need technical guidance
  • 4 page builds failed in one session despite 66 repos installed and 17K references indexed — AI generated from priors every time

Environment

  • Claude Code v2.1.104, Opus 4.6 (1M), Claude Max subscription
  • Codex CLI v0.120.0, GPT-5.4 xhigh reasoning (auditor)

extent analysis

TL;DR

Implementing artifact-gated pipelines and utilizing hooks to inspect tool call history may help address the enforcement gaps in the Claude Code skill pipeline.

Guidance

  • Investigate Claude Code's support for artifact-gated pipelines to ensure that each phase produces validated JSON artifacts before proceeding.
  • Explore the use of hooks to inspect tool call history, allowing for verification of actions such as file reads and writes.
  • Consider implementing a pattern for machine-resolved citations to confirm the existence and content of cited files.
  • Review the possibility of using PreToolUse hooks to enforce minimum tool calls before output, preventing premature writes.
  • Follow up on the roadmap for output-distribution comparison to check if CSS characteristics match expected baselines.

Example

No specific code snippet can be provided without further information on the Claude Code API and hooks. However, a hypothetical example of a hook inspecting tool call history could involve checking the number of Read calls before allowing a Write call.

Notes

The provided information suggests that the current implementation of Claude Code lacks mechanical enforcement, relying on declarative prose and self-attested statements. Addressing the 20 enforcement gaps will likely require a combination of technical guidance and potential updates to the Claude Code platform.

Recommendation

Apply workaround: Utilize available hooks and pipeline configurations to implement artifact-gated pipelines and tool call verification, as these measures can help mitigate the enforcement gaps until a more comprehensive solution is available.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING