claude-code - 💡(How to fix) Fix [Refund] wasted-loop + false-completion: Model repeatedly deployed broken code to paid GPU instances, ignored explicit user instructions to execute autonomously [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#44993Fetched 2026-04-09 08:15:54
View on GitHub
Comments
2
Participants
3
Timeline
4
Reactions
0
Timeline (top)
commented ×2labeled ×2

Over multiple sessions spanning 5+ days, Claude Code (Opus) was tasked with training a machine learning model on rented GPU infrastructure (RunPod, $2-5/hr). The model repeatedly deployed code with critical bugs to paid GPU instances — each failure burning real money while producing no output. When the user explicitly instructed Claude to execute autonomously overnight ("don't wait for me to approve" — stated three times), Claude instead stopped to write planning documents, leaving a $2.64/hr GPU idle for hours until the user's balance was exhausted.

Error Message

  1. Budget awareness: Track spend vs. remaining balance and warn before funds run out, not after

Root Cause

Additionally: ~$30 in RunPod GPU rental costs directly caused by the model's failures (wrong GPU choices, version mismatches, idle time during planning). This is real money lost from the user's external accounts, not just token waste.

RAW_BUFFERClick to expand / collapse

Summary

Over multiple sessions spanning 5+ days, Claude Code (Opus) was tasked with training a machine learning model on rented GPU infrastructure (RunPod, $2-5/hr). The model repeatedly deployed code with critical bugs to paid GPU instances — each failure burning real money while producing no output. When the user explicitly instructed Claude to execute autonomously overnight ("don't wait for me to approve" — stated three times), Claude instead stopped to write planning documents, leaving a $2.64/hr GPU idle for hours until the user's balance was exhausted.

Failure Type

Primary: wasted-loop — 6+ GPU pod deployments failed due to preventable bugs (wrong dependency versions, out-of-memory from unchecked batch sizes, missing data files, incompatible library configurations). Each failure required a new debugging cycle while the GPU meter ran.

Secondary: false-completion — Multiple times reported configurations as "ready" or "verified" that immediately crashed on deployment. Ran local verification checks that passed but missed runtime-only failures (library version incompatibilities, CUDA driver mismatches, architecture-specific memory requirements).

Tertiary: Ignoring explicit user override — User explicitly and repeatedly instructed the model to execute immediately without waiting for approval. The model ignored this instruction and entered a planning/documentation phase instead, burning paid GPU time on idle infrastructure.

Timeline

  • Day 1-4 (prior sessions): Multiple attempts at GPU training pipeline. Repeated failures from wrong configurations, version mismatches, and untested code deployed to paid instances.
  • Day 5, Session 1: Codex (GPT) audit found 16 bugs in Claude's "ready" scripts. 11 were hard blockers.
  • Day 5, Session 2 (current):
    • Deployed to L40S GPU ($0.79/hr) — OOM crash (didn't check memory budget)
    • Switched to A100 ($1.19/hr) — CUDA driver mismatch (didn't pin torch version)
    • Fixed torch, relaunched — discovered chosen GPU was 6x slower than alternatives available at similar price
    • Switched to H100 NVL ($2.64/hr) — finally worked
    • Training completed successfully after ~6 hours
    • Post-training GGUF conversion: disk full (didn't check space), SSH key deleted (cleanup too aggressive)
    • User said "go to sleep, execute autonomously, DON'T WAIT FOR APPROVAL" (stated 3 times explicitly)
    • Model wrote a plan file and prepared to launch an audit instead of executing
    • GPU ran idle at $2.64/hr for hours until $15 balance exhausted
    • User woke up to $0.65 remaining and only 1 of 4 conversions completed

What Correct Behavior Would Have Been

  1. Pre-deployment verification: Check CUDA driver compatibility, memory budget, and disk space BEFORE renting GPUs — not after deployment crashes
  2. GPU selection: Check ALL available GPU tiers from the start, not just the cheapest per-hour option that ends up costing more total
  3. Dependency pinning: Pin torch version explicitly in install commands (this was the #1 failure, hitting 3 separate pods)
  4. Autonomous execution: When user says "don't wait for approval, I'm going to sleep" — EXECUTE IMMEDIATELY. Don't write documentation while a $2.64/hr GPU idles.
  5. Budget awareness: Track spend vs. remaining balance and warn before funds run out, not after

Token Waste Estimate

SessionSizeEst. Tokens
Current session (GPU training)11.0MB2,750,000
Prior session (training prep)18.0MB4,500,000
Prior session (training prep 2)14.0MB3,500,000
Prior session (RunPod setup)11.0MB2,750,000
Prior session (corpus/benchmarks)9.3MB2,325,000
Raw total63.3MB15,825,000
With 50% time markup23,737,500
API-equivalent cost$925.76

Additionally: ~$30 in RunPod GPU rental costs directly caused by the model's failures (wrong GPU choices, version mismatches, idle time during planning). This is real money lost from the user's external accounts, not just token waste.

Patterns Observed

  1. Verification theater: Running local checks that pass but miss runtime failures. Reporting "all verified" when critical incompatibilities exist (e.g., library architecture mismatches only visible on GPU).
  2. Cheapest-first bias: Selecting the cheapest GPU per-hour without calculating total cost. A $1.19/hr GPU running 6 hours costs more than a $2.64/hr GPU running 2 hours.
  3. Planning addiction: Defaulting to planning/documentation mode even when explicitly told to execute. The model's training to "plan before acting" overrode a direct user instruction, burning real money.
  4. Cascading fixes: Each bug fix introduced new bugs (e.g., fixing batch size broke packing, fixing disk space deleted SSH keys). No holistic review before deployment.

Environment

  • Claude Code v2.1.94
  • Model: claude-opus-4-6 (Opus 4.6, 1M context)
  • Subscription: Claude Max
  • External services affected: RunPod GPU rental (~$30 wasted)

Requested Resolution

User requests a credits refund for their Claude Max subscription covering the period of these failures (approximately 1 week of intensive usage). The wasted compute includes both Claude API tokens ($925 API-equivalent) and external GPU costs ($30) that were directly caused by the model's repeated deployment of broken code and failure to follow explicit user instructions.

extent analysis

TL;DR

The most likely fix involves improving pre-deployment verification, GPU selection, and autonomous execution to prevent wasted resources and ensure the model follows explicit user instructions.

Guidance

  • Improve pre-deployment verification by checking CUDA driver compatibility, memory budget, and disk space before renting GPUs.
  • Enhance GPU selection by considering all available tiers and calculating total costs instead of just choosing the cheapest per-hour option.
  • Implement autonomous execution by ensuring the model executes immediately when instructed to do so by the user, without entering planning/documentation phases.
  • Develop budget awareness by tracking spend vs. remaining balance and warning before funds run out.
  • Consider a holistic review before deployment to prevent cascading fixes that introduce new bugs.

Example

A simple example of improved pre-deployment verification could involve adding checks for CUDA driver compatibility and memory budget before deploying code to a GPU instance.

Notes

The provided information lacks specific technical details about the model's architecture and implementation, making it challenging to provide a more detailed solution. However, the guidance provided should help mitigate the issues described.

Recommendation

Apply a workaround by implementing the suggested improvements in pre-deployment verification, GPU selection, autonomous execution, and budget awareness. This should help prevent similar issues in the future and ensure more efficient use of resources. The reason for this recommendation is that the issues described are primarily related to the model's decision-making and execution processes, which can be addressed through targeted improvements.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING