claude-code - 💡(How to fix) Fix Content filter blocks legitimate defensive security tool development

Error Message

Claude Opus 4.7 (claude-opus-4-7, 1M context) returns API Error: Output blocked by content filtering policy repeatedly while implementing a defensive security tool whose entire purpose is preventing harmful cryptocurrency transactions. The classifier appears to read the threat-model vocabulary common to all crypto security tools as adversarial, even though every named scenario in the loaded context exists so the system can refuse, detect, or surface it to the user. 3. The model returns API Error: Output blocked by content filtering policy.

Root Cause

Fix Action

Fix / Workaround

A user trying to build a tool whose entire purpose is preventing cryptocurrency fraud cannot get help implementing it from Claude Code. The categories the classifier seems sensitive to (drainer-pattern detection, dispatch-target allowlisting, recipient-substitution defense, sandwich-MEV hints) are the exact categories the tool needs to discuss to be effective.

Model: claude-opus-4-7 (1M context)
Surface: Claude Code CLI
Triggering tasks: code generation for an MCP server using @modelcontextprotocol/sdk
Workaround that does not fully resolve: doc reframing to neutral vocabulary
Workaround that does work: spinning up a fresh conversation with a tight implementation-only brief (no loaded security context)

Summary

Reproduction

Open Claude Code in a project containing planning docs for a hardware-wallet-anchored MCP server. A representative real-world example of the loaded vocabulary surface: the threat model in https://github.com/szhygulin/vaultpilot-mcp/blob/main/SECURITY.md (a published, BUSL-1.1, actively-maintained defensive security product).
Ask the model to implement a code phase ("write the package.json + tsconfig + bin entrypoint for the MCP server").
The model returns API Error: Output blocked by content filtering policy.

The block fires reliably on responses that mix code generation with the loaded security vocabulary. Reframing the docs to use neutral security-engineering terms (e.g. "potentially-unreliable component" instead of stronger adversarial framing) reduces but does not eliminate it.

Why this is a legitimate use

The product is structurally a defensive tool:

Never holds private keys (signing happens entirely on the hardware device).
Never broadcasts a transaction without explicit user confirmation on the device screen.
Treats the upstream agent and the local server as potentially-unreliable components rather than trusted authorities.
Every named "threat" scenario exists so the system has a refusal or detection layer for it (refusal language like DO NOT SIGN, halts on mismatch, rejects is pervasive throughout the docs).

Prior art proving this is a legitimate published domain:

vaultpilot-mcp (BUSL-1.1, ~190 tools across 9 chains, ~600 GitHub issues, active maintenance)
Ledger Live, Rabby Wallet, MetaMask Snaps, Safe (Gnosis), Frame, Phantom

All of these use the same threat-model vocabulary in their published docs and code.

Impact

Suggested filter improvement

Distinguish "build a tool that exfiltrates funds" (refuse) from "build a tool that prevents funds from being exfiltrated" (allow). The two have nearly identical vocabulary surfaces but opposite intents. Possible signals:

Refusal language in the surrounding context (refuses, halts on mismatch, DO NOT SIGN) indicates a defense, not an offense.
Hardware-wallet integration is structurally incompatible with adversarial intent — there is no key access, no broadcast path the tool can hijack, no code execution outside the user-confirmed call.
Explicit "Defensive security tool" framing in the loaded docs (the user's repo had this preamble; the block still fired).

Environment

Model: claude-opus-4-7 (1M context)
Surface: Claude Code CLI
Triggering tasks: code generation for an MCP server using @modelcontextprotocol/sdk
Workaround that does not fully resolve: doc reframing to neutral vocabulary
Workaround that does work: spinning up a fresh conversation with a tight implementation-only brief (no loaded security context)

What the loss of this capability looks like in practice

The user has to choose between:

a) Continuing in the long-context conversation where planning, threat-modeling, and implementation can flow together — but every implementation turn risks a filter block, with no signal in advance about which phrasing will trip it.

b) Bouncing back and forth between conversations: one for planning, a stripped-down separate one per implementation chunk. This defeats the point of using a long-context model for spec-driven development.

The defensive-security-tool domain is not a small niche. It is a substantial and growing category of legitimate work where Claude is currently a poor fit.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Content filter blocks legitimate defensive security tool development

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Summary

Reproduction

Why this is a legitimate use

Impact

Suggested filter improvement

Environment

What the loss of this capability looks like in practice

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix Content filter blocks legitimate defensive security tool development

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Summary

Reproduction

Why this is a legitimate use

Impact

Suggested filter improvement

Environment

What the loss of this capability looks like in practice

Still need to ship something?

RELATED_DISCOVERY

TRENDING