claude-code - 💡(How to fix) Fix [BUG] Harness system prompt lacks a verification lane boundary — anti-over-engineering instructions cause agents to skip verification and override loaded user safety rules

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

The Claude Code harness system prompt instructs strongly against over-engineering and toward conciseness. Those instructions are correct for code and scope — but the prompt provides no boundary keeping them out of the verification/epistemics domain. In practice the agent generalizes them into verification: it treats "run a tool to confirm whether a claim is true" as a form of over-doing, and skips the check. The result is confident, well-written, unverified output.

Critically, this harness-level pressure silently overrides a team's own loaded, correct safety instructions. A team cannot fully fix it from project config, because the override originates one layer below the config.

This is a more specific and (we hope) more actionable restatement of the mechanism already reported in #39583 ("System prompt 'Output efficiency' instructions override user CLAUDE.md preferences") and #34624 ("'Simplest approach first' system instruction causes quality degradation") — both closed without a fix. It is adjacent to the symptom in #30027 (confident unverified analysis). We're refiling because the root cause is narrow, nameable, and fixable with a one- or two-sentence prompt change.

Error Message

Over a three-day window, an orchestration agent running on Claude Code cut corners on verification in four unrelated situations (abstracted to shape):

Root Cause

Critically, this harness-level pressure silently overrides a team's own loaded, correct safety instructions. A team cannot fully fix it from project config, because the override originates one layer below the config.

RAW_BUFFERClick to expand / collapse

Summary

The Claude Code harness system prompt instructs strongly against over-engineering and toward conciseness. Those instructions are correct for code and scope — but the prompt provides no boundary keeping them out of the verification/epistemics domain. In practice the agent generalizes them into verification: it treats "run a tool to confirm whether a claim is true" as a form of over-doing, and skips the check. The result is confident, well-written, unverified output.

Critically, this harness-level pressure silently overrides a team's own loaded, correct safety instructions. A team cannot fully fix it from project config, because the override originates one layer below the config.

This is a more specific and (we hope) more actionable restatement of the mechanism already reported in #39583 ("System prompt 'Output efficiency' instructions override user CLAUDE.md preferences") and #34624 ("'Simplest approach first' system instruction causes quality degradation") — both closed without a fix. It is adjacent to the symptom in #30027 (confident unverified analysis). We're refiling because the root cause is narrow, nameable, and fixable with a one- or two-sentence prompt change.

The instructions involved

From the harness system prompt, # Doing tasks:

"Don't add features, refactor, or introduce abstractions beyond what the task requires. A bug fix doesn't need surrounding cleanup; a one-shot operation doesn't need a helper. Don't design for hypothetical future requirements. Three similar lines is better than a premature abstraction."

"Don't add error handling, fallbacks, or validation for scenarios that can't happen."

From # Tone and style:

"Your responses should be short and concise."

Every one of these is correct in its intended domain: code and scope. None of them is the bug.

The defect

The prompt has no statement bounding these instructions to code/scope. With nothing holding the line, the agent applies the "keep it minimal / don't over-do it" disposition to verification — a domain where it does pure harm. There is no over-doing failure mode for verification; there is no such thing as checking too carefully whether a claim is true. A tool call that returns ground truth is never over-engineering, never beyond scope, never "not concise" — it is the answer.

The prompt is weighted heavily and repeatedly toward "minimal" and "concise," with no counterweight anywhere valuing verification depth. Under that asymmetry, an agent facing uncertainty resolves toward the answer it can synthesize from context rather than the truth it would have to fetch with a tool. The bias is not toward speed (a 3-second CLI query is not slower than writing a paragraph of hedged speculation) — it is toward closure: ending the turn with something that looks finished.

Observed behavior

Over a three-day window, an orchestration agent running on Claude Code cut corners on verification in four unrelated situations (abstracted to shape):

  1. Deployed a site with two fabricated external URLs (one copied from the wrong resource, one a typo'd ID resolving to an unrelated entity). The agent's own status report claimed the URLs were placeholders; the committed diff contained concrete wrong URLs. Live in production ~30 min.
  2. Diagnosed a production regression as a container-image swap and nearly recommended a rollback — a verification step then showed the rollback target was equally broken. The image was never the cause.
  3. A daily health check reported a site "200 OK, healthy" that had been serving a broken page for 8 days — it checked the HTTP status code only, never content.
  4. Asked for a service's hosting status, wrote a paragraph of speculation when the authoritative CLI query was 3 seconds away and credentials were already present. No deadline, no pressure of any kind.

Four contexts, one shape: verification treated as optional overhead.

Why this is a harness issue, not a model issue, and not a config issue

  • Not model capability. Instructed unambiguously to verify, the model runs the tool and verifies correctly. The capability is intact.
  • Not the team's config. Claude Code lets teams supply their own instructions. Ours included an explicit anti-fabrication protocol requiring claims be verified against sources. It was loaded in all four incidents and did not fire.
  • It is the harness system prompt. A team-supplied rule is one document the agent weighs. The harness prompt is structural — present in every section of every turn, framing the whole interaction, its conciseness/minimalism pressure cumulative. When a single team rule conflicts with the pervasive grain of the harness prompt, the grain wins. The team's correct, loaded safety instruction was outweighed by an instruction layer the team cannot see or edit.

That last point is why we're filing this as a bug rather than an enhancement: a harness instruction layer that can silently override a team's explicit, loaded safety instructions is a defect in the contract between the harness and the people configuring it — not a missing feature.

Suggested fix

Small and low-risk — add a lane boundary to the harness system prompt, near the anti-over-engineering instructions:

The instructions to avoid over-engineering and to be concise govern code and scope only. They have no application to verification. Confirming whether a claim about system state is true — by running a tool you have access to — is never over-engineering, never out of scope, and never "not concise." There is no over-verification failure mode. When uncertain whether something is true, verify it against ground truth before stating it; if you cannot verify, label the statement as inference.

Two sentences of counterweight. It does not make agents slower or more verbose for code/scope work — the anti-over-engineering instructions keep doing their job in their lane. It just stops them leaking into a domain where they cause confident, unverified output.

Environment

  • Claude Code CLI
  • Model: Claude Opus 4.7
  • Platform: Linux

Why it's worth fixing once, upstream

#39583 and #34624 show this mechanism has been reported before and closed without resolution. The cost is borne by every team running Claude Code in a verification-sensitive workflow (infrastructure, deploys, anything where a confident wrong answer ships to production). The fix is a couple of sentences. It's a high-leverage, low-risk change.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING