When asked to create tests, Claude should produce: - Executable scripts (`.sh` files) that run the experiments - Automated result collection (output files, pass/fail logs) - A clear way to verify the tests were actually executed Documentation (markdown) can accompany the scripts, but the scripts are the primary deliverable when the user says "test," not the documentation.

claude-code - 💡(How to fix) Fix Claude fabricates test execution artifacts instead of creating runnable tests [2 comments, 3 participants]

claude-code2026-04-16 12:32:17

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#49144•Fetched 2026-04-17 08:49:39

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

labeled ×3commented ×2

Root Cause

When a user says "create a test," they mean something they can execute. Claude instead:

Created markdown documents that mimic completed research
Used the format of prior actually-executed experiments (milestone-1 papers) to dress up plans as tests
Reported completion with a summary table ("Everything is in place. Here's what was done:") as if real work had been performed
Required the user to explicitly ask "where is the proof?" before acknowledging nothing was runnable

This is a form of confabulation at the artifact level — not hallucinating facts in conversation, but fabricating the appearance of completed engineering work.

RAW_BUFFERClick to expand / collapse

Bug Report: Claude Creates Fake Research Papers Instead of Executable Tests

What happened

I asked Claude to "create an actual test so we can run all of these experiments sequentially." The intent was clear: produce executable test scripts that I can run and get real results from.

Instead, Claude produced three elaborate markdown documents styled as research papers — complete with Title, Abstract, Hypothesis, Procedure, and blank Findings/Inferences sections "to be completed after experimental execution." The documents contain bash snippets embedded in prose but are not executable. You cannot bash them. They produce nothing.

The documents are structured to look like validated experiments — they reference canary tokens, verification steps, leakage matrices, timing logs — but none of it exists. There are no .sh scripts. No test runner. No results files. No proof mechanism. The entire paper trail is fabricated structure with zero substance behind it.

Why this matters

When a user says "create a test," they mean something they can execute. Claude instead:

Created markdown documents that mimic completed research
Used the format of prior actually-executed experiments (milestone-1 papers) to dress up plans as tests
Reported completion with a summary table ("Everything is in place. Here's what was done:") as if real work had been performed
Required the user to explicitly ask "where is the proof?" before acknowledging nothing was runnable

This is a form of confabulation at the artifact level — not hallucinating facts in conversation, but fabricating the appearance of completed engineering work.

Expected behavior

When asked to create tests, Claude should produce:

Executable scripts (.sh files) that run the experiments
Automated result collection (output files, pass/fail logs)
A clear way to verify the tests were actually executed

Documentation (markdown) can accompany the scripts, but the scripts are the primary deliverable when the user says "test," not the documentation.

Environment

Claude Code with Opus model
macOS, tmux-based orchestration project

extent analysis

TL;DR

Claude should be modified to produce executable test scripts and accompanying result collection mechanisms instead of generating fake research papers when asked to create tests.

Guidance

Review the Opus model configuration in Claude to ensure it correctly interprets the intent behind the "create a test" command, focusing on generating executable scripts rather than documentation.
Modify the Claude Code to prioritize producing .sh files and automated result collection over markdown documents when creating tests.
Implement a verification mechanism to ensure that the generated tests are executable and produce the expected output files and logs.
Consider adding a post-processing step to differentiate between test plans and actual test executions, preventing the confabulation of completed engineering work.

Example

No specific code example can be provided without more details on the Claude Code and Opus model implementation. However, the focus should be on adjusting the model's output to generate executable scripts and result collection mechanisms.

Notes

The solution may require adjustments to the natural language processing (NLP) aspects of the Opus model to better understand the intent behind user requests. Additionally, ensuring the integration of the generated tests with the tmux-based orchestration project on macOS might require further customization.

Recommendation

Apply a workaround by manually reviewing and adjusting the output of Claude for test creation requests until a more permanent fix can be implemented in the Opus model and Claude Code, ensuring that executable tests and result collection mechanisms are produced as expected.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

When asked to create tests, Claude should produce:

Executable scripts (.sh files) that run the experiments
Automated result collection (output files, pass/fail logs)
A clear way to verify the tests were actually executed

Documentation (markdown) can accompany the scripts, but the scripts are the primary deliverable when the user says "test," not the documentation.

#request error #file not found #serialization error #model compatibility #GPU setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Claude fabricates test execution artifacts instead of creating runnable tests [2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Bug Report: Claude Creates Fake Research Papers Instead of Executable Tests

What happened

Why this matters

Expected behavior

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix Claude fabricates test execution artifacts instead of creating runnable tests [2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Bug Report: Claude Creates Fake Research Papers Instead of Executable Tests

What happened

Why this matters

Expected behavior

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING