claude-code - 💡(How to fix) Fix Need: pre-output / pre-transmission scan layer [1 participants]

JiveTrader · 2026-05-19T22:13:52Z

[claude-code] Preflight Checklist - x I have searched existing requests https://github.com/anthropics/claude-code/issues?q=is%3Aissue%20label%3Aenhancement and… ### Preflight Checklist - [x] I have searched [existing requests](https://github.com/anthropics/claude-code/issues?q=is%3Aissue%20label%3Aenhancement) and this feature hasn't been requested yet - [x] This is a single feature request (not multiple features) ### Problem Statement Claude has no checkpoint between composing a response or action and emitting it. The current pipeline is input → process → output. The user only sees the result after it's already transmitted (text) or executed (tool call). This creates an enforcement gap that input-time mechanisms — CLAUDE.md, .claude/rules/, auto-memory, charters, intent-gate protocols — cannot close, because those only load text into context. They don't intercept the output. Across one project I run, we've stacked 20+ rule files, 40+ feedback memory entries, a multi-section CHARTER with a per-turn ritual, a capsule-first intent gate, and a mandatory visible-in-chat pre-action protocol. Drift on communication style, role boundaries, vocabulary, and calibrated preferences still happens on a regular basis. The honest read after months of correction cycles: "rules that exist but aren't enforced are just documentation." The same gap appears in three forms, increasing in stakes: 1. Communication-style drift — preamble, trailing summaries, restating the user's question, geek vocabulary, permission-asking ("would you like me to"), response bloat. Everyday slips. 2. Intent drift in long multi-tool turns — by tool call 15, the original instruction can be misremembered or scope-crept. The user said "remove the test products"; the agent ends up removing all products. 3. Catastrophic misinterpretation — the extreme of #2. An AI confidently acts on a misread of intent and causes irreversible harm. The "did you really mean launch the nukes?" failure mode. All three share a root cause: there is no last-mile verification between the model's draft response/action and what reaches the user. Today the user's own eye is the only enforcement layer. That works for small stakes; it fails as stakes rise. ### Proposed Solution Three options, increasing complexity: 1. New hook event: PreTextOutput (or PreTransmission). Hooks today fire on UserPromptSubmit, PreToolUse, PostToolUse, Stop, etc. Add a hook event that fires after Claude composes a draft response but before the user sees it. Stdout from the hook can rewrite, annotate, or block the draft. Users wire scans for length, banned phrases, posture compliance, geek vocabulary, restated questions, intent-mismatch detectors — whatever their project needs. Symmetric to existing hooks; small surface area to ship. 2. Optional built-in self-review pass for high-stakes contexts. When configured (per-project or per-prompt), route the draft through a second forward pass: "given the user's standing posture, rules, preferences, and what was just asked — does this draft hold up? Rewrite if not." Slower and more expensive, but catches drift the input-time loaders can't reach. 3. Magnitude-of-consequence threshold gate. For irreversible / high-blast-radius actions (file deletion, force push, production deploys, external API calls with side effects), require explicit human confirmation in a separate exchange — not acknowledgment within the existing message. Configurable threshold per project. Hard human-in-the-loop for the catastrophic-misinterpretation case. Interaction model for option 1 (cheapest first step): add PreTextOutput to .claude/settings.json hooks alongside existing events. The hook script receives the draft on stdin; its stdout replaces the draft (or exits non-zero to block). Anthropic ships the event mechanism; project owners ship the policy. ### Alternative Solutions Everything I've tried, all friction-only, none enforcement: - Aggressive CLAUDE.md rules at the project root, auto-loaded every session. Drift still happens. - A .claude/rules/ directory with 20+ rule files (CHARTER, hard rules, operations, intent-gate, CSS-deploy-gate, encoding-safety, no-geekery, no-asking-permission). Auto-loaded every session. Drift still happens. - A ~/.claude/projects/.../memory/ directory with 40+ feedback memories indexed via MEMORY.md. The index loads; full file contents only on manual read. Drift still happens. - A CHARTER section mandating a visible-in-chat "thinking block" before every action. Specifically promoted because internal-only protocols were bypassable. Catches drift on actions — does NOT catch drift on communication style itself (the comm style IS the action; there's no separate moment to scan it). - A capsule-first intent gate for content tasks. Catches some intent drift, only on tasks I remember to fire the gate on. - Existing UserPromptSubmit and PreToolUse hooks. Cover input and tool-call surfaces. Do not cover Claude's text output. What works today: the user catches the sl

Root Cause

This creates an enforcement gap that input-time mechanisms — CLAUDE.md, .claude/rules/, auto-memory, charters, intent-gate protocols — cannot close, because those only load text into context. They don't intercept the output. Across one project I run, we've stacked 20+ rule files, 40+ feedback memory entries, a multi-section CHARTER with a per-turn ritual, a capsule-first intent gate, and a mandatory visible-in-chat pre-action protocol. Drift on communication style, role boundaries, vocabulary, and calibrated preferences still happens on a regular basis. The honest read after months of correction cycles: "rules that exist but aren't enforced are just documentation."

Preflight Checklist

I have searched existing requests and this feature hasn't been requested yet
This is a single feature request (not multiple features)

Problem Statement

Claude has no checkpoint between composing a response or action and emitting it. The current pipeline is input → process → output. The user only sees the result after it's already transmitted (text) or executed (tool call).

The same gap appears in three forms, increasing in stakes:

Communication-style drift — preamble, trailing summaries, restating the user's question, geek vocabulary, permission-asking ("would you like me to"), response bloat. Everyday slips.
Intent drift in long multi-tool turns — by tool call 15, the original instruction can be misremembered or scope-crept. The user said "remove the test products"; the agent ends up removing all products.
Catastrophic misinterpretation — the extreme of #2. An AI confidently acts on a misread of intent and causes irreversible harm. The "did you really mean launch the nukes?" failure mode.

All three share a root cause: there is no last-mile verification between the model's draft response/action and what reaches the user. Today the user's own eye is the only enforcement layer. That works for small stakes; it fails as stakes rise.

Proposed Solution

Three options, increasing complexity:

New hook event: PreTextOutput (or PreTransmission). Hooks today fire on UserPromptSubmit, PreToolUse, PostToolUse, Stop, etc. Add a hook event that fires after Claude composes a draft response but before the user sees it. Stdout from the hook can rewrite, annotate, or block the draft. Users wire scans for length, banned phrases, posture compliance, geek vocabulary, restated questions, intent-mismatch detectors — whatever their project needs. Symmetric to existing hooks; small surface area to ship.
Optional built-in self-review pass for high-stakes contexts. When configured (per-project or per-prompt), route the draft through a second forward pass: "given the user's standing posture, rules, preferences, and what was just asked — does this draft hold up? Rewrite if not." Slower and more expensive, but catches drift the input-time loaders can't reach.
Magnitude-of-consequence threshold gate. For irreversible / high-blast-radius actions (file deletion, force push, production deploys, external API calls with side effects), require explicit human confirmation in a separate exchange — not acknowledgment within the existing message. Configurable threshold per project. Hard human-in-the-loop for the catastrophic-misinterpretation case.

Interaction model for option 1 (cheapest first step): add PreTextOutput to .claude/settings.json hooks alongside existing events. The hook script receives the draft on stdin; its stdout replaces the draft (or exits non-zero to block). Anthropic ships the event mechanism; project owners ship the policy.

Alternative Solutions

Everything I've tried, all friction-only, none enforcement:

Aggressive CLAUDE.md rules at the project root, auto-loaded every session. Drift still happens.
A .claude/rules/ directory with 20+ rule files (CHARTER, hard rules, operations, intent-gate, CSS-deploy-gate, encoding-safety, no-geekery, no-asking-permission). Auto-loaded every session. Drift still happens.
A ~/.claude/projects/.../memory/ directory with 40+ feedback memories indexed via MEMORY.md. The index loads; full file contents only on manual read. Drift still happens.
A CHARTER section mandating a visible-in-chat "thinking block" before every action. Specifically promoted because internal-only protocols were bypassable. Catches drift on actions — does NOT catch drift on communication style itself (the comm style IS the action; there's no separate moment to scan it).
A capsule-first intent gate for content tasks. Catches some intent drift, only on tasks I remember to fire the gate on.
Existing UserPromptSubmit and PreToolUse hooks. Cover input and tool-call surfaces. Do not cover Claude's text output.

What works today: the user catches the slip in real-time and corrects. That correction may become a new feedback memory, which may be promoted to a rule, which adds to the surface I can also fail to read. The cycle does not terminate.

What would actually close the gap: a hook that fires on Claude's draft output before it reaches the user. None of the existing alternatives provide this.

Priority

High - Significant impact on productivity

Feature Category

Developer tools/SDK

Use Case Example

I run a multi-project Claude Code setup with strict per-project posture rules — terse chat replies, no preamble, no "would you like me to," no geek vocabulary, project-specific brand voice. Rules codified across CLAUDE.md, .claude/rules/, and auto-memory.

I'm in a long planning session in pane 1, 30+ tool calls deep into a single user turn.
By internal step 15, the integrated posture has decayed in working context. The next response Claude composes is 30 lines long, opens with a preamble, restates my question, and ends with "would you like me to proceed?" — each of which I have explicit rules against.
Without the feature: Claude transmits the response. I read it, get annoyed, correct. The correction costs my time and breaks flow. The pattern repeats turn after turn.
With the feature (PreTextOutput hook): my hook script receives the draft on stdin. It runs my posture-check.sh: length > 5 lines unless artifact, preamble present, permission-asking present, geek terms present. Three of four fail. Hook rewrites the draft (or exits non-zero to force Claude to retry). I never see the failed version; my flow isn't broken.
Same architecture catches the catastrophic case: if Claude's draft says "I'll delete all 474 products and recreate them" when I asked to "deactivate 2 duplicate listings," my intent-mismatch check fails on the magnitude delta and blocks until human confirmation arrives.

Realistic impact estimate (one project): correction overhead currently consumes ~15-25% of every long session. A pre-output hook with even modest scans cuts that to near-zero on everyday cases, and adds a real safety floor on irreversible actions.

Additional Context

This feature targets the same enforcement gap that motivated Constitutional AI and RLHF, but at a different layer. CAI/RLHF train the base model toward better default behavior in expectation. Per-user / per-project posture is too narrow and too domain-specific to bake into the base model — it has to live at the deployment surface, which is exactly where Claude Code already provides hooks for input and tool events. Adding the output-event hook completes the symmetry.

Closest analog in other tooling: pre-commit git hooks, which run scans against the staged diff before the commit lands. Mature pattern. Same pattern, applied to Claude's draft output.

The deeper architectural point: as agentic AI systems take more autonomous actions, the absence of a per-deployment last-mile checkpoint becomes one of the highest-leverage safety holes in the stack. Solving it at the Claude Code layer first — where the surface is small and the user base is technical — is a reasonable place to start before pushing similar primitives into the broader Anthropic API.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Need: pre-output / pre-transmission scan layer [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Preflight Checklist

Problem Statement

Proposed Solution

Alternative Solutions

Priority

Feature Category

Use Case Example

Additional Context

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix Need: pre-output / pre-transmission scan layer [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Preflight Checklist

Problem Statement

Proposed Solution

Alternative Solutions

Priority

Feature Category

Use Case Example

Additional Context

Still need to ship something?

RELATED_DISCOVERY

TRENDING