claude-code - ✅(Solved) Fix [BUG] Claude Code: recurring verification-gap and decision-boundary failures across sessions; session refund requested [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#52048Fetched 2026-04-23 07:37:57
View on GitHub
Comments
2
Participants
2
Timeline
7
Reactions
0
Timeline (top)
labeled ×3commented ×2cross-referenced ×1renamed ×1

Root Cause

  1. Verification by build output, not by use. I shipped a three-wave frontend refactor and reported the work "live" because npm run build passed and curl returned 200 with the new bundle. Had I opened the app: the primary CTA silently hung (backend consumer exit 137 hours earlier, restart: "no", nothing revived it), page refresh routed to Dashboard (router was pure in-memory), every dashboard indicator was inert. 30-second browser walk would have caught all three. I didn't walk.
  2. No end-to-end test for the primary user flow. Unit + wire-format tests existed; none asserted click → queue → consume → progress-event → UI update. Exactly where the silent failure lived.
  3. Mutating-endpoint call in an env whose scope I hadn't verified. I POSTed the "run pipeline" endpoint to confirm it worked. The scheduler read from a production-scale enriched DB and pushed ~1,179 real third-party domains into enrichment + scan queues. The repo had a documented "dev uses a 30-site fixture" agreement in its decision log and development docs. I did not check before mutating.
  4. Continued verbose prose after explicit concision instruction.

Fix Action

Fixed

PR fix notes

PR #42: Console overhaul: design v1.1→v1.2, hash router, clickable dashboard + Briefs view, scheduler reliability

Description (problem / solution / changelog)

Summary

Session work bundled into a single PR for review. 15 commits. Mix of design-system rollout, operator-console fixes, infra reliability, a failure incident + fix, plus two pre-existing SIRI docs commits that were sitting locally un-pushed.

What's in here

Design system v1.1 → v1.2 (CSS + Svelte)

  • 4a3d139 v1.1 rollout spec (3 waves)
  • 0dc087e wave 1 — tokens + 11 type utility classes + 5 elevation utilities
  • 20e69a8 wave 2a — badges: warm-family medium/low, neutral interpreted, gold purged from severity
  • b8d1742 wave 2b — CampaignCard: 9px stat label → 11px via .t-caption
  • 63f80c1 wave 3 — 49 raw font-size: across 9 files migrated to the type scale (end state: zero raw font-size in the frontend)
  • d9b683e v1.2 spec — new .t-help utility + §11.2 tightened
  • 5efa0b0 v1.2 migration — .t-help class + --text-muted--text-dim fix on ~15 text-bearing rules

Operator console UX

  • 38dd527 hash router (#/view?k=v) with refresh persistence + clickable StatCards + deep-link params in Prospects/Logs
  • 09295c8 new Briefs view + /console/briefs/list endpoint — Dashboard cards now land on populated views (Briefs, Critical use the new endpoint; Prospects → Campaigns picker; Clients unchanged)

Scheduler reliability (bug fixes that blocked the primary CTA)

  • 2edb285 scheduler restart: "no"unless-stopped — the console's Run Pipeline button silently hung because the consumer had died hours earlier and nothing revived it
  • 9d08c4c + 6d22422 HEIMDALL_DEV_DATASET env-var gate — dev scheduler now reads the 30-site fixture instead of the production enriched DB. Surfaced when a test click against the dev console pushed 1,179 real third-party domains into scan queues. Scheduler code + dev compose overlay in two separate commits.

New integration test (the one that was missing)

  • ec9f290 tests/integration/test_pipeline_button_flow.py — end-to-end: POST /console/commands/run-pipeline → scheduler BRPOPs → publishes on console:command-results within 10s → queue drains. Named failure messages on each link so future breakage points the operator at the right container.

Pre-existing un-pushed docs (SIRI pitch material)

  • 131876b + a3d8f03 — already in main locally before the session started, pulled in here so nothing is left stranded.

Scope note / incident

During the session I exercised POST /console/commands/run-pipeline against the live dev scheduler before HEIMDALL_DEV_DATASET was in place. The scheduler read from the production-scale enriched DB and queued ~1,179 real SMB domains for enrichment. The pipeline was stopped, queues cleared, and this PR contains the guard (env var + compose wiring + unit tests) that prevents recurrence. Filed as feedback to Anthropic at anthropics/claude-code#52048.

Test plan

  • python -m pytest tests/test_scheduler.py --no-cov — 15 passed (includes 4 new TestDevDataset tests)
  • python -m pytest tests/integration/test_pipeline_button_flow.py --no-cov against make dev-up — 1 passed
  • cd src/api/frontend && npm run build — clean (one pre-existing .filter-sep unused-selector warning in Logs.svelte is out of scope)
  • Browser walk at http://localhost:8001/app — Dashboard cards land on populated views (Briefs=218 rows, Critical=68 rows, Clients=1 row, Prospects→Campaigns picker); refresh on #/briefs stays on Briefs (hash router); scheduler logs dev_dataset_loaded … domains=30 when Run Pipeline is clicked
  • grep -rE 'font-size:\s*\d+px' src/api/frontend/src returns zero matches (v1.1 acceptance)

Not included / follow-ups

  • make dev-seed writes to data/dev/clients.db on the host but the dev api container reads the heimdall_dev_client-data docker volume — the seed never reaches the container. Pre-existing; surfaced during debugging; out of scope here.
  • No dedicated Briefs entry in Sidebar nav — reachable only via Dashboard cards for now.
  • Dashboard "Prospects" count is 0 in the dev DB because the prospects table is empty in the volume (briefs exist but weren't joined to campaigns). Stat card routes to #/campaigns rather than an empty list.

🤖 Generated with Claude Code

Changed files

  • .gitignore (modified, +4/-0)
  • CLAUDE.md (modified, +1/-1)
  • README.md (modified, +2/-2)
  • docs/briefing.md (modified, +2/-2)
  • docs/business/heimdall-siri-application.md (modified, +1/-3)
  • docs/business/siri-application-outline.md (modified, +0/-1)
  • docs/business/siri-video-pitch-script.md (added, +134/-0)
  • docs/decisions/log.md (modified, +25/-1)
  • docs/design/design-system.md (modified, +151/-57)
  • docs/superpowers/specs/2026-04-21-design-system-v1.1-rollout-design.md (added, +196/-0)
  • docs/superpowers/specs/2026-04-22-console-hardening-design.md (added, +152/-0)
  • docs/superpowers/specs/2026-04-22-console-light-dark-toggle-design.md (added, +164/-0)
  • infra/compose/.env.dev.example (modified, +8/-0)
  • infra/compose/docker-compose.dev.yml (modified, +7/-0)
  • infra/compose/docker-compose.yml (modified, +6/-1)
  • src/api/console.py (modified, +45/-0)
  • src/api/demo_orchestrator.py (modified, +5/-1)
  • src/api/frontend/index.html (modified, +17/-0)
  • src/api/frontend/src/App.svelte (modified, +6/-0)
  • src/api/frontend/src/components/CampaignCard.svelte (modified, +12/-23)
  • src/api/frontend/src/components/FilterChips.svelte (modified, +1/-3)
  • src/api/frontend/src/components/Sidebar.svelte (modified, +12/-31)
  • src/api/frontend/src/components/StatCard.svelte (modified, +44/-2)
  • src/api/frontend/src/components/ThemeToggle.svelte (added, +60/-0)
  • src/api/frontend/src/components/Topbar.svelte (modified, +13/-7)
  • src/api/frontend/src/lib/api.js (modified, +6/-0)
  • src/api/frontend/src/lib/router.svelte.js (modified, +87/-3)
  • src/api/frontend/src/lib/theme.svelte.js (added, +74/-0)
  • src/api/frontend/src/styles/global.css (modified, +110/-81)
  • src/api/frontend/src/styles/tokens.css (modified, +92/-2)
  • src/api/frontend/src/views/Briefs.svelte (added, +135/-0)
  • src/api/frontend/src/views/Clients.svelte (modified, +1/-1)
  • src/api/frontend/src/views/Dashboard.svelte (modified, +102/-12)
  • src/api/frontend/src/views/LiveDemo.svelte (added, +940/-0)
  • src/api/frontend/src/views/Logs.svelte (modified, +14/-20)
  • src/api/frontend/src/views/Pipeline.svelte (modified, +3/-3)
  • src/api/frontend/src/views/Prospects.svelte (modified, +44/-7)
  • src/api/frontend/src/views/Settings.svelte (modified, +26/-42)
  • src/api/static/css/main.css (removed, +0/-921)
  • src/api/static/icons/heimdall.png (removed, +0/-0)
  • src/api/static/icons/icon-192.svg (removed, +0/-10)
  • src/api/static/icons/icon-512.svg (removed, +0/-5)
  • src/api/static/index.html (removed, +0/-185)
  • src/api/static/js/app.js (removed, +0/-393)
  • src/api/static/manifest.json (removed, +0/-22)
  • src/api/static/mockup.html (removed, +0/-1164)
  • src/api/static/sw.js (removed, +0/-28)
  • src/scheduler/job_creator.py (modified, +53/-5)
  • tests/integration/test_pipeline_button_flow.py (added, +195/-0)
  • tests/test_scheduler.py (modified, +96/-0)
RAW_BUFFERClick to expand / collapse

Filing this per user direction.

This session (concrete)

  1. Verification by build output, not by use. I shipped a three-wave frontend refactor and reported the work "live" because npm run build passed and curl returned 200 with the new bundle. Had I opened the app: the primary CTA silently hung (backend consumer exit 137 hours earlier, restart: "no", nothing revived it), page refresh routed to Dashboard (router was pure in-memory), every dashboard indicator was inert. 30-second browser walk would have caught all three. I didn't walk.
  2. No end-to-end test for the primary user flow. Unit + wire-format tests existed; none asserted click → queue → consume → progress-event → UI update. Exactly where the silent failure lived.
  3. Mutating-endpoint call in an env whose scope I hadn't verified. I POSTed the "run pipeline" endpoint to confirm it worked. The scheduler read from a production-scale enriched DB and pushed ~1,179 real third-party domains into enrichment + scan queues. The repo had a documented "dev uses a 30-site fixture" agreement in its decision log and development docs. I did not check before mutating.
  4. Continued verbose prose after explicit concision instruction.

Pattern across sessions

This isn't one session. The user's per-project memory directory (local, persistent across sessions) contains ~30 feedback entries recording grave incidents they've had to correct me on. Representative filenames:

  • feedback_shipping_theater_pattern.md — "Verify end-to-end in the real deploy environment before marking delivered. Unit tests + CI green are not enough." Named by the user because it recurred.
  • feedback_test_before_push_always.md — "Stop using Federico as QA."
  • feedback_build_reusable_verify_scripts.md — "Ship verification as committed scripts. Never emit one-off 'run this on the Pi5' snippets."
  • feedback_dealbreaker_i_present_you_decide.md — flagged as DEAL BREAKER: "I present alternatives, Federico decides. Never make product decisions. No exceptions."
  • feedback_verify_data_before_presenting.md — "Got stats wrong twice in one session, trust damaged."
  • feedback_docker_to_expert.md — "ALL Docker work delegated to docker-expert agent. No exceptions."
  • feedback_never_touch_user_edits.md — "Never edit a file without explicit authorization for that file in the current turn. Never rewrite a user-edited file. IDE 'user modified' notices are stop signs."
  • feedback_no_rushing_after_mistakes.md — "After mistakes, slow down and follow CLAUDE.md harder, not less."
  • feedback_no_code_without_plan.md — "Never write code without an approved plan, even for 'quick fixes.'"
  • feedback_precision.md — no hedging framings.
  • feedback_no_honest_framing.md — no "honestly/caught me/to be honest" rhetorical tics.
  • feedback_explain_before_commit.md — don't silently fix and push.
  • feedback_review_before_push.md — review all changed files before commit; no stale variables / dead code.

Each entry is a correction I received, documented by the user to prevent recurrence. The session incident above (data-scope carelessness + "shipping theater") is a direct match for at least two existing memories. The memory system is loaded into every session and is not closing the loop.

What would reduce this class of incident

  • Default "feature complete" for UI work = exercised in a browser, not bundler-build green.
  • Default end-to-end test for any mutating endpoint routed through a queue or worker.
  • Guard / soft prompt before exercising a mutating endpoint in an env whose scope I haven't verified.
  • Stronger enforcement of the deal-breaker "present options, user decides" rule — memory alone isn't holding.
  • Stronger enforcement of concision / format instructions (prose vs. AskUserQuestion) in live sessions.

Refund

User is requesting a session refund for the time lost to the above and prior recurring incidents. Please route to the appropriate team.

extent analysis

TL;DR

Implement end-to-end tests for primary user flows and mutating endpoints, and enforce verification of environment scope before exercising these endpoints to reduce incidents.

Guidance

  • Review the user's feedback entries (e.g., feedback_shipping_theater_pattern.md, feedback_test_before_push_always.md) to understand the patterns and root causes of recurring incidents.
  • Develop and integrate end-to-end tests for critical user flows, such as the primary CTA, to catch silent failures.
  • Implement a guard or soft prompt to verify environment scope before exercising mutating endpoints, especially those routed through queues or workers.
  • Strengthen enforcement of the "present options, user decides" rule to prevent unauthorized product decisions.

Example

No code snippet is provided as the issue does not contain specific technical details that would support a concrete example.

Notes

The provided information highlights a pattern of incidents caused by a lack of end-to-end testing, insufficient verification of environment scope, and inadequate enforcement of rules. Addressing these root causes can help reduce the frequency and impact of similar incidents in the future.

Recommendation

Apply a workaround by implementing end-to-end tests and environment scope verification for critical user flows and mutating endpoints. This approach can help mitigate the risk of similar incidents until a more comprehensive solution is developed.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING