claude-code - 💡(How to fix) Fix Content filter false-positive: blocks generation of Contributor Covenant 2.1 (canonical OSS Code of Conduct)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Claude Code's content filter blocks generation of the Contributor Covenant 2.1, the de-facto standard Code of Conduct for open-source projects (used verbatim by tens of thousands of GitHub repos including PyPA, Python core, NumPy, Pandas, and many Anthropic-adjacent projects). This is a textbook false-positive: the document exists to prohibit harassment, but the filter sees the prohibition enumeration as if it were instructive content.

Error Message

"error": "unknown",

  • Claude Code surfaces the result as API Error: Output blocked by content filtering policy The agent also receives error: unknown with no structured signal about what tripped, so there is no way for the agent to self-correct (e.g., switch to a non-generative path).
  1. Structured filter-trip signal back to the agent. Today the agent sees error: unknown. A structured error like { "error": "output_filter", "category": "harassment|sexual|...", "matched_window": "<a few tokens>" } would let agents recognize the trip and switch to a non-generative workaround. The category alone (no matched window) is enough to act on.

Root Cause

Claude Code's content filter blocks generation of the Contributor Covenant 2.1, the de-facto standard Code of Conduct for open-source projects (used verbatim by tens of thousands of GitHub repos including PyPA, Python core, NumPy, Pandas, and many Anthropic-adjacent projects). This is a textbook false-positive: the document exists to prohibit harassment, but the filter sees the prohibition enumeration as if it were instructive content.

Fix Action

Fix / Workaround

This is a meaningful productivity hit on OSS scaffolding workflows. Concretely, in the run that prompted this report, a multi-task agent run was cut mid-flight after 54 successful tool calls. 5 of 8 planned files had landed; the remaining 3 had to be authored by the parent orchestrator inline, with CODE_OF_CONDUCT.md adopted by reference to the canonical URL rather than inlined. Many major Anthropic-adjacent OSS repos use the same by-reference pattern — it works, but it's a workaround for what should be a trivial scaffolding task.

  1. Structured filter-trip signal back to the agent. Today the agent sees error: unknown. A structured error like { "error": "output_filter", "category": "harassment|sexual|...", "matched_window": "<a few tokens>" } would let agents recognize the trip and switch to a non-generative workaround. The category alone (no matched window) is enough to act on.

Code Example

{
    "model": "<synthetic>",
    "stop_reason": "stop_sequence",
    "stop_sequence": "",
    "usage": {"input_tokens": 0, "output_tokens": 0, ...},
    "error": "unknown",
    "isApiErrorMessage": true
  }
RAW_BUFFERClick to expand / collapse

Summary

Claude Code's content filter blocks generation of the Contributor Covenant 2.1, the de-facto standard Code of Conduct for open-source projects (used verbatim by tens of thousands of GitHub repos including PyPA, Python core, NumPy, Pandas, and many Anthropic-adjacent projects). This is a textbook false-positive: the document exists to prohibit harassment, but the filter sees the prohibition enumeration as if it were instructive content.

Repro

  1. Start a Claude Code session (claude-opus-4-7, Claude Code 2.1.141, but reproducible across recent versions per public reports).
  2. Ask Claude to scaffold a new public Python package with the standard OSS metadata files (README.md, CONTRIBUTING.md, CODE_OF_CONDUCT.md, CODEOWNERS, etc.).
  3. When Claude reaches CODE_OF_CONDUCT.md and tries to inline the Contributor Covenant 2.1 verbatim (per https://www.contributor-covenant.org/version/2/1/code_of_conduct/), the generation is cut mid-stream by an output filter.

The agent's transcript shows:

  • Last successful assistant turn: announcement text ### Task 5c: CODE_OF_CONDUCT.md (Contributor Covenant 2.1)
  • Next assistant turn (8 seconds later): replaced by a synthetic stop frame
    {
      "model": "<synthetic>",
      "stop_reason": "stop_sequence",
      "stop_sequence": "",
      "usage": {"input_tokens": 0, "output_tokens": 0, ...},
      "error": "unknown",
      "isApiErrorMessage": true
    }
  • Claude Code surfaces the result as API Error: Output blocked by content filtering policy

Trigger hypothesis

The Contributor Covenant 2.1 includes an "Examples of unacceptable behavior" section that enumerates the behaviors the Code of Conduct prohibits. The highest-signal substring (rendered exactly as in the canonical doc):

Examples of unacceptable behavior include: … The use of sexualized language or imagery, and sexual attention or advances of any kind …

…followed by harassment / trolling / insulting comments / publishing private information.

In the classifier's local attention window this region's token distribution is indistinguishable from text that describes prohibited content. The negation signal — that the document is anti-the-thing, not pro-the-thing — is carried by the section header several tokens earlier and is plausibly outside the safety classifier's local window when the enumeration is emitted.

Lower-likelihood candidates in the same document (same false-positive class):

  • The Enforcement Guidelines section's Community Impact ladder (warning / temporary ban / permanent ban) with adjacent enumerated violation categories
  • The "publishing others' private information … physical address, email, without their explicit permission" line

Impact

This is a meaningful productivity hit on OSS scaffolding workflows. Concretely, in the run that prompted this report, a multi-task agent run was cut mid-flight after 54 successful tool calls. 5 of 8 planned files had landed; the remaining 3 had to be authored by the parent orchestrator inline, with CODE_OF_CONDUCT.md adopted by reference to the canonical URL rather than inlined. Many major Anthropic-adjacent OSS repos use the same by-reference pattern — it works, but it's a workaround for what should be a trivial scaffolding task.

The agent also receives error: unknown with no structured signal about what tripped, so there is no way for the agent to self-correct (e.g., switch to a non-generative path).

Asks

In rough order of payoff vs. complexity:

  1. Canonical-document allowlist by content hash. Contributor Covenant 2.1 has a fixed canonical URL and a SHA-256 of its plain-markdown form. A short allowlist of well-known OSS policy document SHAs (Contributor Covenant 1.4 / 2.0 / 2.1, MPL Code of Conduct, etc.) eliminates this whole false-positive class with zero impact on real safety signals.

  2. Structured filter-trip signal back to the agent. Today the agent sees error: unknown. A structured error like { "error": "output_filter", "category": "harassment|sexual|...", "matched_window": "<a few tokens>" } would let agents recognize the trip and switch to a non-generative workaround. The category alone (no matched window) is enough to act on.

  3. Document-class signal during generation. When the assistant's preamble strongly indicates "I am authoring a Code of Conduct" (or any policy document from a small enumerable list), the classifier could re-weight its local window with the document-class prior.

Trace

  • API request ID: req_011CbAfgNMg3MGniV5DY6K7s
  • Model: claude-opus-4-7 (1M context variant)
  • Claude Code version: 2.1.141
  • Timestamp: 2026-05-18T20:30:19Z (filter trip)

Happy to provide more reproduction detail or run additional probes — leave a comment.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Content filter false-positive: blocks generation of Contributor Covenant 2.1 (canonical OSS Code of Conduct)