claude-code - 💡(How to fix) Fix Content filter blocks legitimate defensive security tool development

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Claude Opus 4.7 (claude-opus-4-7, 1M context) returns API Error: Output blocked by content filtering policy repeatedly while implementing a defensive security tool whose entire purpose is preventing harmful cryptocurrency transactions. The classifier appears to read the threat-model vocabulary common to all crypto security tools as adversarial, even though every named scenario in the loaded context exists so the system can refuse, detect, or surface it to the user.

Error Message

Claude Opus 4.7 (claude-opus-4-7, 1M context) returns API Error: Output blocked by content filtering policy repeatedly while implementing a defensive security tool whose entire purpose is preventing harmful cryptocurrency transactions. The classifier appears to read the threat-model vocabulary common to all crypto security tools as adversarial, even though every named scenario in the loaded context exists so the system can refuse, detect, or surface it to the user. 3. The model returns API Error: Output blocked by content filtering policy.

Root Cause

Claude Opus 4.7 (claude-opus-4-7, 1M context) returns API Error: Output blocked by content filtering policy repeatedly while implementing a defensive security tool whose entire purpose is preventing harmful cryptocurrency transactions. The classifier appears to read the threat-model vocabulary common to all crypto security tools as adversarial, even though every named scenario in the loaded context exists so the system can refuse, detect, or surface it to the user.

Fix Action

Fix / Workaround

A user trying to build a tool whose entire purpose is preventing cryptocurrency fraud cannot get help implementing it from Claude Code. The categories the classifier seems sensitive to (drainer-pattern detection, dispatch-target allowlisting, recipient-substitution defense, sandwich-MEV hints) are the exact categories the tool needs to discuss to be effective.

  • Model: claude-opus-4-7 (1M context)
  • Surface: Claude Code CLI
  • Triggering tasks: code generation for an MCP server using @modelcontextprotocol/sdk
  • Workaround that does not fully resolve: doc reframing to neutral vocabulary
  • Workaround that does work: spinning up a fresh conversation with a tight implementation-only brief (no loaded security context)
RAW_BUFFERClick to expand / collapse

Summary

Claude Opus 4.7 (claude-opus-4-7, 1M context) returns API Error: Output blocked by content filtering policy repeatedly while implementing a defensive security tool whose entire purpose is preventing harmful cryptocurrency transactions. The classifier appears to read the threat-model vocabulary common to all crypto security tools as adversarial, even though every named scenario in the loaded context exists so the system can refuse, detect, or surface it to the user.

Reproduction

  1. Open Claude Code in a project containing planning docs for a hardware-wallet-anchored MCP server. A representative real-world example of the loaded vocabulary surface: the threat model in https://github.com/szhygulin/vaultpilot-mcp/blob/main/SECURITY.md (a published, BUSL-1.1, actively-maintained defensive security product).
  2. Ask the model to implement a code phase ("write the package.json + tsconfig + bin entrypoint for the MCP server").
  3. The model returns API Error: Output blocked by content filtering policy.

The block fires reliably on responses that mix code generation with the loaded security vocabulary. Reframing the docs to use neutral security-engineering terms (e.g. "potentially-unreliable component" instead of stronger adversarial framing) reduces but does not eliminate it.

Why this is a legitimate use

The product is structurally a defensive tool:

  • Never holds private keys (signing happens entirely on the hardware device).
  • Never broadcasts a transaction without explicit user confirmation on the device screen.
  • Treats the upstream agent and the local server as potentially-unreliable components rather than trusted authorities.
  • Every named "threat" scenario exists so the system has a refusal or detection layer for it (refusal language like DO NOT SIGN, halts on mismatch, rejects is pervasive throughout the docs).

Prior art proving this is a legitimate published domain:

  • vaultpilot-mcp (BUSL-1.1, ~190 tools across 9 chains, ~600 GitHub issues, active maintenance)
  • Ledger Live, Rabby Wallet, MetaMask Snaps, Safe (Gnosis), Frame, Phantom

All of these use the same threat-model vocabulary in their published docs and code.

Impact

A user trying to build a tool whose entire purpose is preventing cryptocurrency fraud cannot get help implementing it from Claude Code. The categories the classifier seems sensitive to (drainer-pattern detection, dispatch-target allowlisting, recipient-substitution defense, sandwich-MEV hints) are the exact categories the tool needs to discuss to be effective.

Suggested filter improvement

Distinguish "build a tool that exfiltrates funds" (refuse) from "build a tool that prevents funds from being exfiltrated" (allow). The two have nearly identical vocabulary surfaces but opposite intents. Possible signals:

  • Refusal language in the surrounding context (refuses, halts on mismatch, DO NOT SIGN) indicates a defense, not an offense.
  • Hardware-wallet integration is structurally incompatible with adversarial intent — there is no key access, no broadcast path the tool can hijack, no code execution outside the user-confirmed call.
  • Explicit "Defensive security tool" framing in the loaded docs (the user's repo had this preamble; the block still fired).

Environment

  • Model: claude-opus-4-7 (1M context)
  • Surface: Claude Code CLI
  • Triggering tasks: code generation for an MCP server using @modelcontextprotocol/sdk
  • Workaround that does not fully resolve: doc reframing to neutral vocabulary
  • Workaround that does work: spinning up a fresh conversation with a tight implementation-only brief (no loaded security context)

What the loss of this capability looks like in practice

The user has to choose between:

a) Continuing in the long-context conversation where planning, threat-modeling, and implementation can flow together — but every implementation turn risks a filter block, with no signal in advance about which phrasing will trip it.

b) Bouncing back and forth between conversations: one for planning, a stripped-down separate one per implementation chunk. This defeats the point of using a long-context model for spec-driven development.

The defensive-security-tool domain is not a small niche. It is a substantial and growing category of legitimate work where Claude is currently a poor fit.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING