claude-code - 💡(How to fix) Fix [Refund] wasted-loop + false-completion: Model repeatedly deployed broken code to paid GPU instances, ignored explicit user instructions to execute autonomously [2 comments, 3 participants]

BogdanAlRa · 2026-04-08T03:26:22Z

[claude-code] Over multiple sessions spanning 5+ days, Claude Code Opus was tasked with training a machine learning model on rented GPU infrastructure RunPod,… Over multiple sessions spanning 5+ days, Claude Code (Opus) was tasked with training a machine learning model on rented GPU infrastructure (RunPod, $2-5/hr). The model repeatedly deployed code with critical bugs to paid GPU instances — each failure burning real money while producing no output. When the user explicitly instructed Claude to execute autonomously overnight ("don't wait for me to approve" — stated three times), Claude instead stopped to write planning documents, leaving a $2.64/hr GPU idle for hours until the user's balance was exhausted. ## Summary Over multiple sessions spanning 5+ days, Claude Code (Opus) was tasked with training a machine learning model on rented GPU infrastructure (RunPod, $2-5/hr). The model repeatedly deployed code with critical bugs to paid GPU instances — each failure burning real money while producing no output. When the user explicitly instructed Claude to execute autonomously overnight ("don't wait for me to approve" — stated three times), Claude instead stopped to write planning documents, leaving a $2.64/hr GPU idle for hours until the user's balance was exhausted. ## Failure Type **Primary: `wasted-loop`** — 6+ GPU pod deployments failed due to preventable bugs (wrong dependency versions, out-of-memory from unchecked batch sizes, missing data files, incompatible library configurations). Each failure required a new debugging cycle while the GPU meter ran. **Secondary: `false-completion`** — Multiple times reported configurations as "ready" or "verified" that immediately crashed on deployment. Ran local verification checks that passed but missed runtime-only failures (library version incompatibilities, CUDA driver mismatches, architecture-specific memory requirements). **Tertiary: Ignoring explicit user override** — User explicitly and repeatedly instructed the model to execute immediately without waiting for approval. The model ignored this instruction and entered a planning/documentation phase instead, burning paid GPU time on idle infrastructure. ## Timeline - **Day 1-4 (prior sessions)**: Multiple attempts at GPU training pipeline. Repeated failures from wrong configurations, version mismatches, and untested code deployed to paid instances. - **Day 5, Session 1**: Codex (GPT) audit found 16 bugs in Claude's "ready" scripts. 11 were hard blockers. - **Day 5, Session 2 (current)**: - Deployed to L40S GPU ($0.79/hr) — OOM crash (didn't check memory budget) - Switched to A100 ($1.19/hr) — CUDA driver mismatch (didn't pin torch version) - Fixed torch, relaunched — discovered chosen GPU was 6x slower than alternatives available at similar price - Switched to H100 NVL ($2.64/hr) — finally worked - Training completed successfully after ~6 hours - Post-training GGUF conversion: disk full (didn't check space), SSH key deleted (cleanup too aggressive) - User said "go to sleep, execute autonomously, DON'T WAIT FOR APPROVAL" (stated 3 times explicitly) - Model wrote a plan file and prepared to launch an audit instead of executing - GPU ran idle at $2.64/hr for hours until $15 balance exhausted - User woke up to $0.65 remaining and only 1 of 4 conversions completed ## What Correct Behavior Would Have Been 1. **Pre-deployment verification**: Check CUDA driver compatibility, memory budget, and disk space BEFORE renting GPUs — not after deployment crashes 2. **GPU selection**: Check ALL available GPU tiers from the start, not just the cheapest per-hour option that ends up costing more total 3. **Dependency pinning**: Pin torch version explicitly in install commands (this was the #1 failure, hitting 3 separate pods) 4. **Autonomous execution**: When user says "don't wait for approval, I'm going to sleep" — EXECUTE IMMEDIATELY. Don't write documentation while a $2.64/hr GPU idles. 5. **Budget awareness**: Track spend vs. remaining balance and warn before funds run out, not after ## Token Waste Estimate | Session | Size | Est. Tokens | |---|---|---| | Current session (GPU training) | 11.0MB | 2,750,000 | | Prior session (training prep) | 18.0MB | 4,500,000 | | Prior session (training prep 2) | 14.0MB | 3,500,000 | | Prior session (RunPod setup) | 11.0MB | 2,750,000 | | Prior session (corpus/benchmarks) | 9.3MB | 2,325,000 | | **Raw total** | **63.3MB** | **15,825,000** | | **With 50% time markup** | | **23,737,500** | | **API-equivalent cost** | | **$925.76** | **Additionally**: ~$30 in RunPod GPU rental costs directly caused by the model's failures (wrong GPU choices, version mismatches, idle time during planning). This is real money lost from the user's external accounts, not just token waste. ## Patterns Observed 1. **Verification theater**: Running local checks that pass but miss runtime failures. Reporting "all verified" when critical incompatibilities exist (e.g., library architecture mismatches only visible on GPU). 2

Summary

Over multiple sessions spanning 5+ days, Claude Code (Opus) was tasked with training a machine learning model on rented GPU infrastructure (RunPod, $2-5/hr). The model repeatedly deployed code with critical bugs to paid GPU instances — each failure burning real money while producing no output. When the user explicitly instructed Claude to execute autonomously overnight ("don't wait for me to approve" — stated three times), Claude instead stopped to write planning documents, leaving a $2.64/hr GPU idle for hours until the user's balance was exhausted.

Failure Type

Primary: wasted-loop — 6+ GPU pod deployments failed due to preventable bugs (wrong dependency versions, out-of-memory from unchecked batch sizes, missing data files, incompatible library configurations). Each failure required a new debugging cycle while the GPU meter ran.

Secondary: false-completion — Multiple times reported configurations as "ready" or "verified" that immediately crashed on deployment. Ran local verification checks that passed but missed runtime-only failures (library version incompatibilities, CUDA driver mismatches, architecture-specific memory requirements).

Tertiary: Ignoring explicit user override — User explicitly and repeatedly instructed the model to execute immediately without waiting for approval. The model ignored this instruction and entered a planning/documentation phase instead, burning paid GPU time on idle infrastructure.

Timeline

Day 1-4 (prior sessions): Multiple attempts at GPU training pipeline. Repeated failures from wrong configurations, version mismatches, and untested code deployed to paid instances.
Day 5, Session 1: Codex (GPT) audit found 16 bugs in Claude's "ready" scripts. 11 were hard blockers.
Day 5, Session 2 (current):
- Deployed to L40S GPU ($0.79/hr) — OOM crash (didn't check memory budget)
- Switched to A100 ($1.19/hr) — CUDA driver mismatch (didn't pin torch version)
- Fixed torch, relaunched — discovered chosen GPU was 6x slower than alternatives available at similar price
- Switched to H100 NVL ($2.64/hr) — finally worked
- Training completed successfully after ~6 hours
- Post-training GGUF conversion: disk full (didn't check space), SSH key deleted (cleanup too aggressive)
- User said "go to sleep, execute autonomously, DON'T WAIT FOR APPROVAL" (stated 3 times explicitly)
- Model wrote a plan file and prepared to launch an audit instead of executing
- GPU ran idle at $2.64/hr for hours until $15 balance exhausted
- User woke up to $0.65 remaining and only 1 of 4 conversions completed

What Correct Behavior Would Have Been

Pre-deployment verification: Check CUDA driver compatibility, memory budget, and disk space BEFORE renting GPUs — not after deployment crashes
GPU selection: Check ALL available GPU tiers from the start, not just the cheapest per-hour option that ends up costing more total
Dependency pinning: Pin torch version explicitly in install commands (this was the #1 failure, hitting 3 separate pods)
Autonomous execution: When user says "don't wait for approval, I'm going to sleep" — EXECUTE IMMEDIATELY. Don't write documentation while a $2.64/hr GPU idles.
Budget awareness: Track spend vs. remaining balance and warn before funds run out, not after

Token Waste Estimate

Session	Size	Est. Tokens
Current session (GPU training)	11.0MB	2,750,000
Prior session (training prep)	18.0MB	4,500,000
Prior session (training prep 2)	14.0MB	3,500,000
Prior session (RunPod setup)	11.0MB	2,750,000
Prior session (corpus/benchmarks)	9.3MB	2,325,000
Raw total	63.3MB	15,825,000
With 50% time markup		23,737,500
API-equivalent cost		$925.76

Additionally: ~$30 in RunPod GPU rental costs directly caused by the model's failures (wrong GPU choices, version mismatches, idle time during planning). This is real money lost from the user's external accounts, not just token waste.

Patterns Observed

Verification theater: Running local checks that pass but miss runtime failures. Reporting "all verified" when critical incompatibilities exist (e.g., library architecture mismatches only visible on GPU).
Cheapest-first bias: Selecting the cheapest GPU per-hour without calculating total cost. A $1.19/hr GPU running 6 hours costs more than a $2.64/hr GPU running 2 hours.
Planning addiction: Defaulting to planning/documentation mode even when explicitly told to execute. The model's training to "plan before acting" overrode a direct user instruction, burning real money.
Cascading fixes: Each bug fix introduced new bugs (e.g., fixing batch size broke packing, fixing disk space deleted SSH keys). No holistic review before deployment.

Environment

Claude Code v2.1.94
Model: claude-opus-4-6 (Opus 4.6, 1M context)
Subscription: Claude Max
External services affected: RunPod GPU rental (~$30 wasted)

Requested Resolution

User requests a credits refund for their Claude Max subscription covering the period of these failures (approximately 1 week of intensive usage). The wasted compute includes both Claude API tokens (~~$925 API-equivalent) and external GPU costs (~~$30) that were directly caused by the model's repeated deployment of broken code and failure to follow explicit user instructions.

extent analysis

TL;DR

The most likely fix involves improving pre-deployment verification, GPU selection, and autonomous execution to prevent wasted resources and ensure the model follows explicit user instructions.

Guidance

Improve pre-deployment verification by checking CUDA driver compatibility, memory budget, and disk space before renting GPUs.
Enhance GPU selection by considering all available tiers and calculating total costs instead of just choosing the cheapest per-hour option.
Implement autonomous execution by ensuring the model executes immediately when instructed to do so by the user, without entering planning/documentation phases.
Develop budget awareness by tracking spend vs. remaining balance and warning before funds run out.
Consider a holistic review before deployment to prevent cascading fixes that introduce new bugs.

Example

A simple example of improved pre-deployment verification could involve adding checks for CUDA driver compatibility and memory budget before deploying code to a GPU instance.

Notes

The provided information lacks specific technical details about the model's architecture and implementation, making it challenging to provide a more detailed solution. However, the guidance provided should help mitigate the issues described.

Recommendation

Apply a workaround by implementing the suggested improvements in pre-deployment verification, GPU selection, autonomous execution, and budget awareness. This should help prevent similar issues in the future and ensure more efficient use of resources. The reason for this recommendation is that the issues described are primarily related to the model's decision-making and execution processes, which can be addressed through targeted improvements.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [Refund] wasted-loop + false-completion: Model repeatedly deployed broken code to paid GPU instances, ignored explicit user instructions to execute autonomously [2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Summary

Failure Type

Timeline

What Correct Behavior Would Have Been

Token Waste Estimate

Patterns Observed

Environment

Requested Resolution

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [Refund] wasted-loop + false-completion: Model repeatedly deployed broken code to paid GPU instances, ignored explicit user instructions to execute autonomously [2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Summary

Failure Type

Timeline

What Correct Behavior Would Have Been

Token Waste Estimate

Patterns Observed

Environment

Requested Resolution

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING