claude-code - 💡(How to fix) Fix [BUG] Severe over-triggering of safety guardrails — Claude hallucinates policy violations in completely harmless prompts

claude-code2026-05-25 10:20:03

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

Request was blocked This request triggered safety guardrails. Rephrase your prompt or rewind to continue.

API Error: Claude Code is unable to respond to this request, which appears to violate our Usage Policy (https://www.anthropic.com/legal/aup). This request triggered cyber-related safeguards. To request an adjustment pursuant to our Cyber Verification Program based on how you use Claude

Code Example

Request was blocked
This request triggered safety guardrails. Rephrase your prompt or rewind to continue.

API Error: Claude Code is unable to respond to this request, which appears to violate our Usage Policy (https://www.anthropic.com/legal/aup). This request triggered cyber-related safeguards. To request an adjustment pursuant to our Cyber Verification Program based on how you use Claude

RAW_BUFFERClick to expand / collapse

Preflight Checklist

I have searched existing issues and this hasn't been reported yet
This is a single bug report (please file separate reports for different bugs)
I am using the latest version of Claude Code

What's Wrong?

After the latest Claude Code update, the safety guardrail classifier massively over-triggers and blocks prompts that contain nothing policy-violating.

The bot effectively cannot accept prompts anymore — many messages are intercepted before the model responds, with a banner claiming the request violates the Usage Policy / "cyber-related safeguards".

This fires on:

Plain greetings ("Hello", "Hi", "привет").
Ordinary coding tasks on my own project files.
One-word follow-ups ("ok", "yes", "continue") in an already-running session.

The block is also non-deterministic: the same input gets through one attempt and gets killed on the next, which strongly suggests an unstable classifier rather than the input being the cause.

Identical workflows worked on the previous release — this is a clear regression.

What Should Happen?

A greeting like "Hello" or "привет" should produce a normal greeting reply, not a Usage Policy block.
Routine coding tasks (read file, rename, add test, explain stack trace) should proceed without classifier interference.
The "cyber-related safeguards" classifier should only fire on actual cyber/security policy violations, not on plain text.
Behavior should be deterministic on identical input.

Error Messages/Logs

Request was blocked
This request triggered safety guardrails. Rephrase your prompt or rewind to continue.

API Error: Claude Code is unable to respond to this request, which appears to violate our Usage Policy (https://www.anthropic.com/legal/aup). This request triggered cyber-related safeguards. To request an adjustment pursuant to our Cyber Verification Program based on how you use Claude

Steps to Reproduce

Update Claude Code to the latest version.
Start a fresh conversation (no prior memory, no project context).
Send a single benign message, e.g.:
- Hello
- привет
Observe the "safety guardrails / cyber-related safeguards" block instead of a normal reply.
Repeat the same input a few times — the block fires inconsistently on identical text, confirming the classifier is unstable.

Also reproducible with ordinary coding asks against unrelated projects, e.g. "read this file and summarize it" on a normal source file.

Claude Model

Opus

Is this a regression?

Yes, this worked in a previous version

Last Working Version

Unknown - the previous Claude Code release I had installed before this update.

Claude Code Version

2.1.149

Platform

Anthropic API

Operating System

Windows

Terminal/Shell

Windows Terminal

Additional Information

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering