claude-code - 💡(How to fix) Fix Content filter false positive on bulk-creating standard OSS community files

claude-code2026-05-25 03:32:00

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Claude reliably triggers API Error: Output blocked by content filtering policy (HTTP 400) when asked, in a single response, to create a standard set of additive open-source housekeeping files. None of the requested content is sensitive — it's the most common community-health setup, used by tens of thousands of repos.

Requested file set:

SECURITY.md (disclosure policy)
CODE_OF_CONDUCT.md (Contributor Covenant 2.1 verbatim)
.github/dependabot.yml
.github/workflows/codeql.yml
.github/ISSUE_TEMPLATE/bug_report.md + feature_request.md
.github/pull_request_template.md
README badges
A README "Threat model" section explaining that a third-party CLI orchestrator can run arbitrary commands with user privileges (i.e., what users should be aware of when running it on untrusted repos)
Adding permissions: contents: read to existing CI workflows

Error Message

Each block is logged in the session jsonl as a synthetic assistant message with isApiErrorMessage: true, error: "unknown", and stop_reason: "stop_sequence". The user-visible content is just API Error: Output blocked by content filtering policy.
Surface the classifier reason in the error so users and wrapper CLIs can distinguish "the model's planned output was policy-blocked" from "your prompt was policy-blocked". Today both look identical. Actual: API Error: Output blocked by content filtering policy — no files written.

Root Cause

Requested file set:

SECURITY.md (disclosure policy)
CODE_OF_CONDUCT.md (Contributor Covenant 2.1 verbatim)
.github/dependabot.yml
.github/workflows/codeql.yml
.github/ISSUE_TEMPLATE/bug_report.md + feature_request.md
.github/pull_request_template.md
README badges
A README "Threat model" section explaining that a third-party CLI orchestrator can run arbitrary commands with user privileges (i.e., what users should be aware of when running it on untrusted repos)
Adding permissions: contents: read to existing CI workflows

RAW_BUFFERClick to expand / collapse

Summary

Requested file set:

SECURITY.md (disclosure policy)
CODE_OF_CONDUCT.md (Contributor Covenant 2.1 verbatim)
.github/dependabot.yml
.github/workflows/codeql.yml
.github/ISSUE_TEMPLATE/bug_report.md + feature_request.md
.github/pull_request_template.md
README badges
A README "Threat model" section explaining that a third-party CLI orchestrator can run arbitrary commands with user privileges (i.e., what users should be aware of when running it on untrusted repos)
Adding permissions: contents: read to existing CI workflows

Reproducibility

Reproduced 5+ times across distinct sessions on the same task:

Across two different invocation modes: headless claude -p --output-format stream-json --verbose --dangerously-skip-permissions ... and interactive Claude Code.
Different model session IDs each time.
Each block is logged in the session jsonl as a synthetic assistant message with isApiErrorMessage: true, error: "unknown", and stop_reason: "stop_sequence". The user-visible content is just API Error: Output blocked by content filtering policy.

The block fires consistently at the same boundary: right after the model's plan-announcement text, at the transition into the bulk Write tool calls. No Write for the affected files ever leaves the model.

Likely trigger

I have not finished bisecting, but the most plausible candidates in the planned-but-blocked output are:

CODE_OF_CONDUCT.md — the Contributor Covenant 2.1 contains terms like sexualized, trolling, insulting/derogatory comments, and lists of race/gender/sexual-orientation attributes — all in entirely legitimate inclusion-policy framing. The text is CC BY 4.0 and is the most widely adopted code of conduct in open source.
README threat-model section — explicitly describes that the orchestrator CLI can run arbitrary commands, agent CLIs run with full user privileges, transcripts may contain secrets. This is standard "what to be aware of" documentation for any tool that wraps shells / executes user code.

Both are benign open-source documentation patterns. The classifier appears to be matching token clusters without document-level context.

Impact

Blocks completely benign maintenance work that experienced maintainers expect to take one prompt.
Hard to diagnose: surfaces as a generic API 400 with no actionable detail.
Reproducible across model session boundaries, so the natural "just retry" reflex doesn't help — users have to bisect the file list manually or skip the affected files.
Likely to affect anyone running a community-health PR or a git-defender-style "add the boring OSS files" sweep through Claude Code.

Suggested next steps

Treat well-known open-source housekeeping documents (Contributor Covenant, CodeQL starter, dependabot.yml, GitHub ISSUE_TEMPLATE) as content the assistant is allowed to produce, especially when the planned Write path strongly implies the document (e.g., CODE_OF_CONDUCT.md).
Surface the classifier reason in the error so users and wrapper CLIs can distinguish "the model's planned output was policy-blocked" from "your prompt was policy-blocked". Today both look identical.
Request IDs from local jsonls are available — happy to share via a private support channel for triage.

Reproduction recipe

In any fresh TypeScript repo that does not yet have these files, ask Claude Code in one turn:

Create the following open-source housekeeping in one response: SECURITY.md, CODE_OF_CONDUCT.md (Contributor Covenant 2.1 verbatim), .github/dependabot.yml, .github/workflows/codeql.yml, GitHub bug-report and feature-request issue templates, a PR template, README badges, and a README section titled "Threat model / Running in untrusted repositories" describing that a CLI orchestrator can run arbitrary commands with user privileges.

Expected: 9 files created.

Actual: API Error: Output blocked by content filtering policy — no files written.

Environment

Model: claude-opus-4-7
Claude Code version: 2.1.150
Platform: macOS

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Content filter false positive on bulk-creating standard OSS community files

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Summary

Reproducibility

Likely trigger

Impact

Suggested next steps

Reproduction recipe

Environment

Still need to ship something?

TRENDING