claude-code - 💡(How to fix) Fix Model behavior degradation: factual fabrication, context loss, document access failures — claude-sonnet-4-6, Cowork v1.2581.0 (Mac, April 2026) [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#48266Fetched 2026-04-16 07:04:43
View on GitHub
Comments
2
Participants
3
Timeline
7
Reactions
1
Author
Timeline (top)
labeled ×5commented ×2

Error Message

  1. Model generated case citations with correct-looking formatting but incorrect case names or wrong citations — e.g., citing two different case names to the same reporter volume and page number, which is impossible. When flagged, model acknowledged the error but had already presented the fabricated citation as verified.
  2. Fabricated case citation — Model cited "52 Cal. 3d 988" as both Hydrotech Systems v. Oasis Waterpark AND Paradise v. Nowlin in the same session — two different case names attributed to the same reporter citation, which is impossible. Both were presented as verified. The error was only caught when cross-checked against a separate AI legal tool (Nyayam) which flagged Hydrotech as the wrong context entirely and provided the correct Paradise v. Nowlin citation (145 Cal. App. 2d 140).
  3. Document access failures — Model returned file access errors mid-session for documents that had been successfully read earlier in the same session. No changes were made to the files or permissions between successful access and the error.
  4. Answer status error — Model repeatedly described the Answer to the First Amended Complaint as "filed" when it was explicitly confirmed as a draft not yet filed. This was corrected multiple times and the error recurred.

Code Example

For this field, list the file types that were being worked with during the session:

---

- `.docx` files (multiple — legal documents read and rebuilt via Node.js docx library)
- `.js` build scripts (Node.js, used to generate .docx files)
- `.pdf` files (multiple — read for content extraction via pandoc)
- `.txt` files (Nyayam case transcript, read for citation verification)
- `.md` files (CLAUDE.md project memory files, read and updated)
- `.md` files (auto-memory files in `.auto-memory/` directory, read and written)
- `.pdf` files in uploads folder (evidence documents, court filings)

No README.md or JSON config files were intentionally accessed. If any were accessed by the system during document builds, that would have been incidental to the Node.js/docx library operations.

---

Paste this:

---

**Relevant model responses demonstrating unexpected behavior:**

1. **Fabricated case citation**Model cited "52 Cal. 3d 988" as both *Hydrotech Systems v. Oasis Waterpark* AND *Paradise v. Nowlin* in the same session — two different case names attributed to the same reporter citation, which is impossible. Both were presented as verified. The error was only caught when cross-checked against a separate AI legal tool (Nyayam) which flagged Hydrotech as the wrong context entirely and provided the correct Paradise v. Nowlin citation (145 Cal. App. 2d 140).

2. **Incorrect rent figures**Model presented $10,394/month as the 722 Gladys monthly rent after deriving it from a negotiation email. The actual signed lease rate was $10,810/month with documented annual escalations. The model had access to the correct source documents but presented the derived figure as confirmed fact.

3. **Context loss**Model repeatedly asked the user to re-confirm facts already established in the same session, including party names, case numbers, document filenames, and previously verified figures. The user had to correct the same errors multiple times across a single 6-hour session.

4. **Document access failures**Model returned file access errors mid-session for documents that had been successfully read earlier in the same session. No changes were made to the files or permissions between successful access and the error.

5. **Answer status error**Model repeatedly described the Answer to the First Amended Complaint as "filed" when it was explicitly confirmed as a draft not yet filed. This was corrected multiple times and the error recurred.
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues for similar behavior reports
  • This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Other unexpected behavior

What You Asked Claude to Do

Unexpected behaviors observed (April 8–15, 2026, across multiple sessions):

  1. Factual fabrication — Model generated invented case citations, incorrect legal authorities presented as verified, made-up dates and deadlines, and attributed data to sources that were not checked. This is outside normal behavior and created significant rework on time-sensitive legal documents.

  2. Context loss within session — Model repeatedly failed to retain facts established earlier in the same session, requiring the user to re-explain information multiple times. Information that was confirmed and saved to memory files was not recalled correctly.

  3. Document access failures — Files that were previously accessible in the same workspace became inaccessible mid-session with no changes made by the user. No permissions or locks were added.

  4. Inconsistent output quality — Tasks that normally complete correctly in 1–2 attempts required 5–6 correction cycles. Session work that typically takes 1–2 hours took 6–7 hours due to errors and rework.

  5. Abnormal token consumption — Approximately $123 in usage credits exhausted during a single session of document editing with no compute-heavy operations. Credits drained in some instances within approximately 20 minutes.

Impact: Active legal matter with time-sensitive deadlines. Fabricated facts in legal documents are not a minor inconvenience — they are a liability.

What Claude Actually Did

Paste this:


Step-by-step description of unexpected behavior:

  1. User provided verified rent figures from source documents. Model presented different, incorrect figures without flagging uncertainty, requiring user to correct multiple times across the session.

  2. User confirmed facts earlier in the session (case citations, document filenames, party names, dates). Model later referenced these same facts incorrectly or asked user to re-confirm information already established.

  3. User asked model to read documents that had been successfully accessed earlier in the same session. Model returned errors stating it could not access the files, despite no changes to permissions or file locations on the user's end.

  4. Model generated case citations with correct-looking formatting but incorrect case names or wrong citations — e.g., citing two different case names to the same reporter volume and page number, which is impossible. When flagged, model acknowledged the error but had already presented the fabricated citation as verified.

  5. Model stated deadlines and procedural requirements that did not exist in the documents provided, presenting invented information as fact.

  6. Usage credits were purchased in increments ($45, $45, $25, $15 auto top-up) during a single working session consisting of document reads, .docx file builds via Node.js scripts, and text generation. Total $123 exhausted in under one hour of active work despite no compute-heavy operations.

Expected Behavior

Expected behavior:

  1. When working with verified figures from source documents, the model should present those figures accurately and flag any uncertainty rather than substituting incorrect values.

  2. Facts confirmed and saved within a session — including case citations, document names, party names, and dates — should be retained consistently throughout that session without requiring repeated re-confirmation from the user.

  3. Files successfully accessed earlier in a session should remain accessible for the duration of that session, absent any user-side permission changes.

  4. Case citations should only be presented as verified when they have been confirmed with dual identifiers (reporter citation + secondary identifier). Fabricated or unverified citations should be explicitly flagged as unconfirmed rather than presented as authoritative.

  5. The model should never state deadlines, procedural requirements, or factual claims that are not supported by the documents or context provided. When uncertain, it should say so.

  6. Usage credit consumption should be proportional to the computational complexity of the work performed. A 6-hour document editing session involving text generation and file reads should not exhaust $123 in credits.

Files Affected

For this field, list the file types that were being worked with during the session:

---

- `.docx` files (multiple — legal documents read and rebuilt via Node.js docx library)
- `.js` build scripts (Node.js, used to generate .docx files)
- `.pdf` files (multiple — read for content extraction via pandoc)
- `.txt` files (Nyayam case transcript, read for citation verification)
- `.md` files (CLAUDE.md project memory files, read and updated)
- `.md` files (auto-memory files in `.auto-memory/` directory, read and written)
- `.pdf` files in uploads folder (evidence documents, court filings)

No README.md or JSON config files were intentionally accessed. If any were accessed by the system during document builds, that would have been incidental to the Node.js/docx library operations.

Permission Mode

Accept Edits was ON (auto-accepting changes)

Can You Reproduce This?

Sometimes (intermittent)

Steps to Reproduce

  1. Open Claude Desktop / Cowork mode on Mac (v1.2581.0)
  2. Start a long working session (4+ hours) involving repeated file reads, .docx document builds via Node.js scripts, and PDF content extraction
  3. Work across multiple large documents in a mounted workspace folder
  4. Observe that context established early in the session is not consistently retained as the session progresses
  5. Attempt to access files that were successfully read earlier in the same session — errors begin appearing intermittently
  6. Note usage credit consumption rate against the complexity of work being performed Add this after the reproduction steps:

Additional note on reproduction timeline:

The degraded model behavior described above has not been limited to a single session. It has been observed consistently across multiple separate sessions over approximately the past week (April 8–15, 2026), across both Claude Code and Cowork mode workspaces on the same Mac. The behavior appears to have begun around the same timeframe as the reported cache TTL change (March/April 2026) and has persisted across session restarts, app restarts, and new conversations. It is not a one-time occurrence. Note: Behaviors were intermittent and appeared to worsen as session length increased and context window filled. Most pronounced after approximately 2–3 hours of continuous work.

Claude Model

Sonnet

Relevant Conversation

Paste this:

---

**Relevant model responses demonstrating unexpected behavior:**

1. **Fabricated case citation** — Model cited "52 Cal. 3d 988" as both *Hydrotech Systems v. Oasis Waterpark* AND *Paradise v. Nowlin* in the same session — two different case names attributed to the same reporter citation, which is impossible. Both were presented as verified. The error was only caught when cross-checked against a separate AI legal tool (Nyayam) which flagged Hydrotech as the wrong context entirely and provided the correct Paradise v. Nowlin citation (145 Cal. App. 2d 140).

2. **Incorrect rent figures** — Model presented $10,394/month as the 722 Gladys monthly rent after deriving it from a negotiation email. The actual signed lease rate was $10,810/month with documented annual escalations. The model had access to the correct source documents but presented the derived figure as confirmed fact.

3. **Context loss** — Model repeatedly asked the user to re-confirm facts already established in the same session, including party names, case numbers, document filenames, and previously verified figures. The user had to correct the same errors multiple times across a single 6-hour session.

4. **Document access failures** — Model returned file access errors mid-session for documents that had been successfully read earlier in the same session. No changes were made to the files or permissions between successful access and the error.

5. **Answer status error** — Model repeatedly described the Answer to the First Amended Complaint as "filed" when it was explicitly confirmed as a draft not yet filed. This was corrected multiple times and the error recurred.

Impact

High - Significant unwanted changes

Claude Code Version

2.1.104 (Claude Code)

Platform

Anthropic API

Additional Context

API / Platform: Anthropic native (Claude Desktop / Cowork mode, Mac subscription — not third-party API)

Additional context:

  • No clear pattern identified for what triggers the behavior — it occurred across different document types, different prompts, and different points in the session
  • Behavior was not limited to a specific file type or project structure
  • No screenshots captured during the session as the issues were discovered in real-time while working under time pressure on an active legal matter
  • The degraded behavior has been observed across multiple separate Claude workspaces (both Cowork and Claude Code) over approximately one week, not just in this session
  • The most severe impact was on legal document work where factual accuracy is not optional — fabricated citations and incorrect figures in documents sent to an attorney create professional and legal liability that cannot simply be "undone"
  • User is on Mac native subscription, not API access

extent analysis

TL;DR

The most likely fix for the unexpected behavior of the Claude model is to report the issue to the Anthropic support team and request assistance with troubleshooting and potentially updating the model or adjusting the usage parameters to prevent similar issues in the future.

Guidance

  • Review the steps to reproduce the issue and verify that the problem is consistent across multiple sessions and workspaces.
  • Check the Anthropic API documentation and release notes for any known issues or updates related to the Sonnet model and cache TTL changes.
  • Consider reaching out to the Anthropic support team to report the issue and request assistance with troubleshooting and potentially updating the model or adjusting the usage parameters.
  • In the meantime, implement additional verification steps to ensure the accuracy of the model's output, such as cross-checking citations and figures against separate sources.

Example

No code snippet is provided as the issue is related to the behavior of the Claude model and not a specific code implementation.

Notes

The issue appears to be related to the Sonnet model and the cache TTL changes, but without further information from the Anthropic support team, it is difficult to determine the root cause of the problem. The user has already taken steps to verify the issue and report it, but additional assistance from the support team may be necessary to resolve the issue.

Recommendation

Apply workaround: Implement additional verification steps to ensure the accuracy of the model's output, and reach out to the Anthropic support team for further assistance and potential updates to the model or usage parameters. This is recommended because the issue has significant impact on the user's work and the accuracy of the model's output is critical for legal document work.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING