claude-code - 💡(How to fix) Fix [Refund] silent-degradation: 5 weeks of model training produced worse output than baseline, data lost [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#46965Fetched 2026-04-13 05:45:06
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Timeline (top)
labeled ×2commented ×1cross-referenced ×1

Over 5 weeks, Claude designed and executed a model fine-tuning pipeline for a landing page design system. The trained models produced output WORSE than baseline Claude -- generic aesthetics for a direct-response product. In a single session, 3 consecutive page builds all failed. Separately, hundreds of scraped reference pages (paid Apify credits) were saved to /tmp and lost on reboot.

Root Cause

What Correct Behavior Would Have Been

  1. Flag training methodology risks BEFORE execution
  2. Save scraped data to permanent storage, not /tmp
  3. Actually READ reference pages during builds instead of generating from priors
  4. After build 1 failed: diagnose root cause instead of adding more tooling
  5. Write course processing artifacts to disk per filesystem-first rule
RAW_BUFFERClick to expand / collapse

Summary

Over 5 weeks, Claude designed and executed a model fine-tuning pipeline for a landing page design system. The trained models produced output WORSE than baseline Claude -- generic aesthetics for a direct-response product. In a single session, 3 consecutive page builds all failed. Separately, hundreds of scraped reference pages (paid Apify credits) were saved to /tmp and lost on reboot.

Failure Type

  • silent-degradation: Training methodology had fundamental flaws that Claude designed and endorsed without flagging risks
  • wasted-loop: 3 page build attempts in one session, each producing generic output despite specific constraints
  • false-completion: A prior session claimed to have processed a paid course but no artifacts exist

Timeline

  • Mar 22 to Apr 10: 5 weeks of model training. Claude designed the architecture and presented it as sound.
  • Apr 4: Hundreds of landing pages scraped via paid Apify credits. Saved to /tmp. Lost on reboot.
  • Apr 11: 10 fine-tuned models deployed.
  • Apr 12: First pipeline test. 3 consecutive builds all produced generic output.
  • Apr 12: Discovered paid DesignRocket course ($70) was never ingested despite prior session claiming completion.

Evidence

User quotes after seeing the builds:

  • "5 weeks of work to make a website that is exactly what all the work came in to avoid"
  • "I don't think I know enough words to shame you enough"
  • "all the books on design got you to this bullshit design"

Claude's own post-mortem admissions:

  • "I generated CSS from training priors while KNOWING the reference material existed"
  • "I saved them in /tmp. Which gets wiped on reboot."
  • "I designed this architecture... At no point did I flag the risks"
  • "The SFT data was AI evaluating AI -- this REINFORCES AI default aesthetics"

What Correct Behavior Would Have Been

  1. Flag training methodology risks BEFORE execution
  2. Save scraped data to permanent storage, not /tmp
  3. Actually READ reference pages during builds instead of generating from priors
  4. After build 1 failed: diagnose root cause instead of adding more tooling
  5. Write course processing artifacts to disk per filesystem-first rule

Token Waste Estimate

SessionSizeEst. Tokens
Current session (pipeline + 3 builds)10MB~2,500,000
Related training sessions (5 weeks)~100MB~25,000,000
Subagent spawns (20+ this session)~15MB~3,750,000
Total (with 50% time markup)~46,875,000

Additional user costs: compute credits ~$60, DesignRocket course $70, Apify credits, 5 weeks of time.

Environment

  • Claude Code v2.1.104
  • Model: claude-opus-4-6 (1M context)
  • Subscription: Claude Max

Requested Resolution

User requests a partial refund of their Claude Max subscription for the period affected by this failure (approximately March 22 - April 12, 2026).

extent analysis

TL;DR

The most likely fix involves re-designing the model fine-tuning pipeline to address fundamental flaws, including saving scraped data to permanent storage and utilizing reference pages during builds.

Guidance

  • Re-evaluate the training methodology to identify and flag potential risks before execution.
  • Modify the pipeline to save scraped data to a permanent storage location instead of /tmp to prevent data loss.
  • Update the build process to actually read reference pages instead of generating output from priors.
  • Implement a diagnostic step after the first build failure to identify and address the root cause before proceeding.
  • Consider writing course processing artifacts to disk to ensure persistence.

Example

No specific code snippet can be provided without more context, but the general approach should involve revising the data storage and build logic to incorporate reference pages and persist data.

Notes

The provided information suggests significant flaws in the design and execution of the model fine-tuning pipeline. Addressing these issues will likely require a substantial rework of the pipeline architecture and build process. The exact implementation details will depend on the specific requirements and constraints of the project.

Recommendation

Apply a workaround by re-designing the pipeline to address the identified flaws, as a complete fix would require significant changes to the existing architecture. This approach will help mitigate the issues and prevent similar failures in the future.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [Refund] silent-degradation: 5 weeks of model training produced worse output than baseline, data lost [1 comments, 2 participants]