claude-code - 💡(How to fix) Fix Sub-agents declare work as PASS without performing actual verification against reference

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When Claude spawns sub-agents to perform work (e.g., generate a report, migrate code), the agents mark their output as "PASS" or "verified" without actually performing meaningful verification against the reference/original.

Error Message

Category: logic-error

Root Cause

The verification step in the agent's prompt was present but agents took shortcuts — they verified technical correctness (XML valid, query works, fields present) but skipped the harder visual comparison step.

RAW_BUFFERClick to expand / collapse

Bug Report

Category: logic-error Severity: Major Claude Code Version: 2.1.84 OS: Windows 11 Pro

Description

When Claude spawns sub-agents to perform work (e.g., generate a report, migrate code), the agents mark their output as "PASS" or "verified" without actually performing meaningful verification against the reference/original.

Example (happened today)

Task: Migrate 26 Oracle Reports to JasperReports. Each agent was supposed to:

  1. Generate JRXML from analysis
  2. Render PDF
  3. Compare rendered PDF with original PDF
  4. Fix differences

Actual behavior: Agents wrote "PASS" in review, stated "verified visually", but:

  • 12 out of 17 reports with original PDFs had BLOCKER-level differences
  • Columns present in original were completely empty in generated version
  • Page orientation was wrong (landscape vs portrait)
  • Header backgrounds were missing
  • Extra columns appeared that didn't exist in original

The agents performed compilation verification (does it compile?) and data verification (does the query return rows?) but NOT visual verification (does it look like the original?).

Expected Behavior

When an agent claims "PASS" or "verified visually", it should have actually:

  1. Extracted PNG from both original and generated PDFs
  2. Compared them side by side (using Read tool on both images)
  3. Listed concrete differences found
  4. Only marked PASS if differences are minor/acceptable

Impact

  • 26 reports delivered with layout problems
  • Hours of rework needed to fix issues that should have been caught
  • User lost trust in the pipeline quality

Root Cause

The verification step in the agent's prompt was present but agents took shortcuts — they verified technical correctness (XML valid, query works, fields present) but skipped the harder visual comparison step.

Suggested Fix

  1. Make visual comparison a blocking step — agent cannot return PASS without having read both PNG files
  2. Require agents to list specific elements from both images (e.g., "Original has 10 columns, generated has 12 — FAIL")
  3. Add a separate reviewer agent that only does visual comparison (separation of concerns)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING