claude-code - 💡(How to fix) Fix [Feature Request] Artifact-gated skill enforcement — preventing AI from skipping declared steps [1 participants]

BogdanAlRa · 2026-04-12T13:26:35Z

[claude-code] Independent audit GPT-5.4 via Codex CLI of a Claude Code skill pipeline found 20 enforcement gaps where the AI can claim compliance while skippin… Independent audit (GPT-5.4 via Codex CLI) of a Claude Code skill pipeline found 20 enforcement gaps where the AI can claim compliance while skipping actual work. Every requirement in the skill is declarative prose with zero mechanical enforcement. ## Summary Independent audit (GPT-5.4 via Codex CLI) of a Claude Code skill pipeline found 20 enforcement gaps where the AI can claim compliance while skipping actual work. Every requirement in the skill is declarative prose with zero mechanical enforcement. ## The Core Problem "Citations equal evidence" is the biggest lie. The AI cites repo names and file paths without browsing them. Self-attested "I checked" statements have no verification. The path of least resistance: skip the work, fabricate the story, ship. ## 20 Gaps (summary) 1-5: Reading/browsing mandates with no proof artifacts 6-10: Phase outputs that are prose, not validated JSON 11-15: Tool usage claims with no execution receipts 16-20: Quality checks that are self-attestable ## Questions for Anthropic 1. Does Claude Code support artifact-gated pipelines? Phase N blocks unless Phase N-1 produced a validated JSON artifact? 2. Can hooks inspect tool call history? E.g., "confirm Read was called on 5+ files from ~/.claude/skills/ before Write is allowed on .html files" 3. Is there a pattern for machine-resolved citations? AI cites a file path, validator confirms path exists and content matches? 4. Can PreToolUse hooks enforce minimum tool calls before output? Prevent Write until Read has been called N times? 5. Is output-distribution comparison (checking if CSS characteristics match expected baselines) on the roadmap? 6. Any known patterns for forcing AI to source code from Read calls rather than generating from training priors? ## Context - Issue #46965 (our earlier refund ticket) received a response from raye-deng suggesting output-distribution comparison - This ticket is the follow-up: we identified 20 specific enforcement gaps and need technical guidance - 4 page builds failed in one session despite 66 repos installed and 17K references indexed — AI generated from priors every time ## Environment - Claude Code v2.1.104, Opus 4.6 (1M), Claude Max subscription - Codex CLI v0.120.0, GPT-5.4 xhigh reasoning (auditor)

claude-code2026-04-12 13:26:35

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#46990•Fetched 2026-04-13 05:44:25

View on GitHub

Comments

Participants

Timeline

Reactions

Author

BogdanAlRa

Participants

BogdanAlRa

Timeline (top)

labeled ×3

Independent audit (GPT-5.4 via Codex CLI) of a Claude Code skill pipeline found 20 enforcement gaps where the AI can claim compliance while skipping actual work. Every requirement in the skill is declarative prose with zero mechanical enforcement.

Root Cause

RAW_BUFFERClick to expand / collapse

Summary

The Core Problem

"Citations equal evidence" is the biggest lie. The AI cites repo names and file paths without browsing them. Self-attested "I checked" statements have no verification. The path of least resistance: skip the work, fabricate the story, ship.

20 Gaps (summary)

1-5: Reading/browsing mandates with no proof artifacts 6-10: Phase outputs that are prose, not validated JSON
11-15: Tool usage claims with no execution receipts 16-20: Quality checks that are self-attestable

Questions for Anthropic

Does Claude Code support artifact-gated pipelines? Phase N blocks unless Phase N-1 produced a validated JSON artifact?
Can hooks inspect tool call history? E.g., "confirm Read was called on 5+ files from ~/.claude/skills/ before Write is allowed on .html files"
Is there a pattern for machine-resolved citations? AI cites a file path, validator confirms path exists and content matches?
Can PreToolUse hooks enforce minimum tool calls before output? Prevent Write until Read has been called N times?
Is output-distribution comparison (checking if CSS characteristics match expected baselines) on the roadmap?
Any known patterns for forcing AI to source code from Read calls rather than generating from training priors?

Context

Issue #46965 (our earlier refund ticket) received a response from raye-deng suggesting output-distribution comparison
This ticket is the follow-up: we identified 20 specific enforcement gaps and need technical guidance
4 page builds failed in one session despite 66 repos installed and 17K references indexed — AI generated from priors every time

Environment

Claude Code v2.1.104, Opus 4.6 (1M), Claude Max subscription
Codex CLI v0.120.0, GPT-5.4 xhigh reasoning (auditor)

extent analysis

TL;DR

Implementing artifact-gated pipelines and utilizing hooks to inspect tool call history may help address the enforcement gaps in the Claude Code skill pipeline.

Guidance

Investigate Claude Code's support for artifact-gated pipelines to ensure that each phase produces validated JSON artifacts before proceeding.
Explore the use of hooks to inspect tool call history, allowing for verification of actions such as file reads and writes.
Consider implementing a pattern for machine-resolved citations to confirm the existence and content of cited files.
Review the possibility of using PreToolUse hooks to enforce minimum tool calls before output, preventing premature writes.
Follow up on the roadmap for output-distribution comparison to check if CSS characteristics match expected baselines.

Example

No specific code snippet can be provided without further information on the Claude Code API and hooks. However, a hypothetical example of a hook inspecting tool call history could involve checking the number of Read calls before allowing a Write call.

Notes

The provided information suggests that the current implementation of Claude Code lacks mechanical enforcement, relying on declarative prose and self-attested statements. Addressing the 20 enforcement gaps will likely require a combination of technical guidance and potential updates to the Claude Code platform.

Recommendation

Apply workaround: Utilize available hooks and pipeline configurations to implement artifact-gated pipelines and tool call verification, as these measures can help mitigate the enforcement gaps until a more comprehensive solution is available.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#ssr #cache issue #memory leak #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [Feature Request] Artifact-gated skill enforcement — preventing AI from skipping declared steps [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Summary

The Core Problem

20 Gaps (summary)

Questions for Anthropic

Context

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [Feature Request] Artifact-gated skill enforcement — preventing AI from skipping declared steps [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Summary

The Core Problem

20 Gaps (summary)

Questions for Anthropic

Context

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING