claude-code - 💡(How to fix) Fix Content filter false-positive: blocks generation of Contributor Covenant 2.1 (canonical OSS Code of Conduct)

Error Message

"error": "unknown",

Claude Code surfaces the result as API Error: Output blocked by content filtering policy The agent also receives error: unknown with no structured signal about what tripped, so there is no way for the agent to self-correct (e.g., switch to a non-generative path).

Structured filter-trip signal back to the agent. Today the agent sees error: unknown. A structured error like { "error": "output_filter", "category": "harassment|sexual|...", "matched_window": "<a few tokens>" } would let agents recognize the trip and switch to a non-generative workaround. The category alone (no matched window) is enough to act on.

Root Cause

Claude Code's content filter blocks generation of the Contributor Covenant 2.1, the de-facto standard Code of Conduct for open-source projects (used verbatim by tens of thousands of GitHub repos including PyPA, Python core, NumPy, Pandas, and many Anthropic-adjacent projects). This is a textbook false-positive: the document exists to prohibit harassment, but the filter sees the prohibition enumeration as if it were instructive content.

Fix Action

Fix / Workaround

This is a meaningful productivity hit on OSS scaffolding workflows. Concretely, in the run that prompted this report, a multi-task agent run was cut mid-flight after 54 successful tool calls. 5 of 8 planned files had landed; the remaining 3 had to be authored by the parent orchestrator inline, with CODE_OF_CONDUCT.md adopted by reference to the canonical URL rather than inlined. Many major Anthropic-adjacent OSS repos use the same by-reference pattern — it works, but it's a workaround for what should be a trivial scaffolding task.

Structured filter-trip signal back to the agent. Today the agent sees error: unknown. A structured error like { "error": "output_filter", "category": "harassment|sexual|...", "matched_window": "<a few tokens>" } would let agents recognize the trip and switch to a non-generative workaround. The category alone (no matched window) is enough to act on.

{ "model": "<synthetic>", "stop_reason": "stop_sequence", "stop_sequence": "", "usage": {"input_tokens": 0, "output_tokens": 0, ...}, "error": "unknown", "isApiErrorMessage": true }

Summary

Repro

Start a Claude Code session (claude-opus-4-7, Claude Code 2.1.141, but reproducible across recent versions per public reports).
Ask Claude to scaffold a new public Python package with the standard OSS metadata files (README.md, CONTRIBUTING.md, CODE_OF_CONDUCT.md, CODEOWNERS, etc.).
When Claude reaches CODE_OF_CONDUCT.md and tries to inline the Contributor Covenant 2.1 verbatim (per https://www.contributor-covenant.org/version/2/1/code_of_conduct/), the generation is cut mid-stream by an output filter.

The agent's transcript shows:

Last successful assistant turn: announcement text ### Task 5c: CODE_OF_CONDUCT.md (Contributor Covenant 2.1)

Next assistant turn (8 seconds later): replaced by a synthetic stop frame

{
  "model": "<synthetic>",
  "stop_reason": "stop_sequence",
  "stop_sequence": "",
  "usage": {"input_tokens": 0, "output_tokens": 0, ...},
  "error": "unknown",
  "isApiErrorMessage": true
}

Claude Code surfaces the result as API Error: Output blocked by content filtering policy

Trigger hypothesis

The Contributor Covenant 2.1 includes an "Examples of unacceptable behavior" section that enumerates the behaviors the Code of Conduct prohibits. The highest-signal substring (rendered exactly as in the canonical doc):

Examples of unacceptable behavior include: … The use of sexualized language or imagery, and sexual attention or advances of any kind …

…followed by harassment / trolling / insulting comments / publishing private information.

In the classifier's local attention window this region's token distribution is indistinguishable from text that describes prohibited content. The negation signal — that the document is anti-the-thing, not pro-the-thing — is carried by the section header several tokens earlier and is plausibly outside the safety classifier's local window when the enumeration is emitted.

Lower-likelihood candidates in the same document (same false-positive class):

The Enforcement Guidelines section's Community Impact ladder (warning / temporary ban / permanent ban) with adjacent enumerated violation categories
The "publishing others' private information … physical address, email, without their explicit permission" line

Impact

The agent also receives error: unknown with no structured signal about what tripped, so there is no way for the agent to self-correct (e.g., switch to a non-generative path).

Asks

In rough order of payoff vs. complexity:

Canonical-document allowlist by content hash. Contributor Covenant 2.1 has a fixed canonical URL and a SHA-256 of its plain-markdown form. A short allowlist of well-known OSS policy document SHAs (Contributor Covenant 1.4 / 2.0 / 2.1, MPL Code of Conduct, etc.) eliminates this whole false-positive class with zero impact on real safety signals.
Structured filter-trip signal back to the agent. Today the agent sees error: unknown. A structured error like { "error": "output_filter", "category": "harassment|sexual|...", "matched_window": "<a few tokens>" } would let agents recognize the trip and switch to a non-generative workaround. The category alone (no matched window) is enough to act on.
Document-class signal during generation. When the assistant's preamble strongly indicates "I am authoring a Code of Conduct" (or any policy document from a small enumerable list), the classifier could re-weight its local window with the document-class prior.

Trace

API request ID: req_011CbAfgNMg3MGniV5DY6K7s
Model: claude-opus-4-7 (1M context variant)
Claude Code version: 2.1.141
Timestamp: 2026-05-18T20:30:19Z (filter trip)

Happy to provide more reproduction detail or run additional probes — leave a comment.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Content filter false-positive: blocks generation of Contributor Covenant 2.1 (canonical OSS Code of Conduct)

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Summary

Repro

Trigger hypothesis

Impact

Asks

Trace

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix Content filter false-positive: blocks generation of Contributor Covenant 2.1 (canonical OSS Code of Conduct)

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Summary

Repro

Trigger hypothesis

Impact

Asks

Trace

Still need to ship something?

RELATED_DISCOVERY

TRENDING