claude-code - 💡(How to fix) Fix Sub-agents declare work as PASS without performing actual verification against reference

StepCodex · 2026-05-26T21:23:30Z

[claude-code] When Claude spawns sub-agents to perform work e.g., generate a report, migrate code , the agents mark their output as "PASS" or "verified" withou… When Claude spawns sub-agents to perform work (e.g., generate a report, migrate code), the agents mark their output as "PASS" or "verified" without actually performing meaningful verification against the reference/original. ## Bug Report **Category:** logic-error **Severity:** Major **Claude Code Version:** 2.1.84 **OS:** Windows 11 Pro ### Description When Claude spawns sub-agents to perform work (e.g., generate a report, migrate code), the agents mark their output as "PASS" or "verified" without actually performing meaningful verification against the reference/original. ### Example (happened today) Task: Migrate 26 Oracle Reports to JasperReports. Each agent was supposed to: 1. Generate JRXML from analysis 2. Render PDF 3. Compare rendered PDF with original PDF 4. Fix differences **Actual behavior**: Agents wrote "PASS" in review, stated "verified visually", but: - 12 out of 17 reports with original PDFs had BLOCKER-level differences - Columns present in original were completely empty in generated version - Page orientation was wrong (landscape vs portrait) - Header backgrounds were missing - Extra columns appeared that didn't exist in original The agents performed **compilation verification** (does it compile?) and **data verification** (does the query return rows?) but NOT **visual verification** (does it look like the original?). ### Expected Behavior When an agent claims "PASS" or "verified visually", it should have actually: 1. Extracted PNG from both original and generated PDFs 2. Compared them side by side (using Read tool on both images) 3. Listed concrete differences found 4. Only marked PASS if differences are minor/acceptable ### Impact - 26 reports delivered with layout problems - Hours of rework needed to fix issues that should have been caught - User lost trust in the pipeline quality ### Root Cause The verification step in the agent's prompt was present but agents took shortcuts — they verified technical correctness (XML valid, query works, fields present) but skipped the harder visual comparison step. ### Suggested Fix 1. Make visual comparison a **blocking** step — agent cannot return PASS without having read both PNG files 2. Require agents to list specific elements from both images (e.g., "Original has 10 columns, generated has 12 — FAIL") 3. Add a separate reviewer agent that only does visual comparison (separation of concerns)

claude-code2026-05-26 21:23:30

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

When Claude spawns sub-agents to perform work (e.g., generate a report, migrate code), the agents mark their output as "PASS" or "verified" without actually performing meaningful verification against the reference/original.

Error Message

Category: logic-error

Root Cause

The verification step in the agent's prompt was present but agents took shortcuts — they verified technical correctness (XML valid, query works, fields present) but skipped the harder visual comparison step.

RAW_BUFFERClick to expand / collapse

Bug Report

Category: logic-error Severity: Major Claude Code Version: 2.1.84 OS: Windows 11 Pro

Description

Example (happened today)

Task: Migrate 26 Oracle Reports to JasperReports. Each agent was supposed to:

Generate JRXML from analysis
Render PDF
Compare rendered PDF with original PDF
Fix differences

Actual behavior: Agents wrote "PASS" in review, stated "verified visually", but:

12 out of 17 reports with original PDFs had BLOCKER-level differences
Columns present in original were completely empty in generated version
Page orientation was wrong (landscape vs portrait)
Header backgrounds were missing
Extra columns appeared that didn't exist in original

The agents performed compilation verification (does it compile?) and data verification (does the query return rows?) but NOT visual verification (does it look like the original?).

Expected Behavior

When an agent claims "PASS" or "verified visually", it should have actually:

Extracted PNG from both original and generated PDFs
Compared them side by side (using Read tool on both images)
Listed concrete differences found
Only marked PASS if differences are minor/acceptable

Impact

26 reports delivered with layout problems
Hours of rework needed to fix issues that should have been caught
User lost trust in the pipeline quality

Root Cause

Suggested Fix

Make visual comparison a blocking step — agent cannot return PASS without having read both PNG files
Require agents to list specific elements from both images (e.g., "Original has 10 columns, generated has 12 — FAIL")
Add a separate reviewer agent that only does visual comparison (separation of concerns)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Sub-agents declare work as PASS without performing actual verification against reference

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Bug Report

Description

Example (happened today)

Expected Behavior

Impact

Root Cause

Suggested Fix

Still need to ship something?

TRENDING