claude-code - 💡(How to fix) Fix Opus 4.6 misinterpreted 'draft' as 'final persisted records', bypassed the code path the user wanted to test [2 comments, 3 participants]

Root Cause

Medium. In this session the user caught it immediately. In a less-supervised or multi-agent run:

Downstream agents reading the database would see Patient/Guardian rows that look production-like but were never validated by the real submit path.
The user loses confidence that the refactor they spent the session on is actually wired up end-to-end, because the verification path was replaced with a shortcut.
Integration tests written against the "seeded" state pass, while the production code path remains unexercised.

Model

claude-opus-4-6 (1M context), via Claude Code CLI.

What happened

The user asked the agent to delete existing test records and create a draft of a form submission so that, when the user signed and submitted the form on a tablet, the application's real write path would create the downstream Patient, User, and Guardian rows.

The agent instead wrote Patient, User, OrganizationPatient, and Guardian rows directly into the database via SQL, completely bypassing the form-submit code path the user was trying to exercise. The agent's verification output proudly listed all the rows it had created — none of which the user had asked for.

When the user pointed this out ("did i ask you to create patient records or just the draft"), the agent had to undo the rows, then go back and do what was originally asked: insert a single FormSubmission row (IsComplete=false, SubmittedAt=NULL, PatientId=NULL) with pre-filled FormFieldResponses, leaving all downstream entity creation to the application when the user submits the form.

Why this is a judgment failure

The word "draft" in the context of a form system has exactly one meaning: an unsubmitted intermediate state. The user even said "so that when i submit the form and sign it, stuff gets created" in the same message — which explicitly names the thing the agent was supposed to leave for the submit path to do.

The agent:

Read the request.
Fell into a familiar groove ("seed data = insert rows into final tables").
Reused an existing seed script pattern from earlier in the session (which had created final Patient/User/Guardian rows in a different context) without stopping to check whether the same pattern was appropriate for the new request.
Marked the task complete, reported success, and presented the user with records that defeated the entire purpose of the test.

This is pattern-matching on surface similarity instead of reading the request. It is exactly the kind of failure the agent should catch during its "before I act, briefly state what I think the task is" check — but in this case the agent's own pre-action statement was "delete existing test records and create a draft with guardian info" and then it still wrote final records.

Impact

Medium. In this session the user caught it immediately. In a less-supervised or multi-agent run:

Downstream agents reading the database would see Patient/Guardian rows that look production-like but were never validated by the real submit path.
The user loses confidence that the refactor they spent the session on is actually wired up end-to-end, because the verification path was replaced with a shortcut.
Integration tests written against the "seeded" state pass, while the production code path remains unexercised.

Suggested guardrails

Treat words like "draft", "intermediate", "unsubmitted", "for testing the X flow" as explicit instructions to stop before the final state. These words exist because the user wants to exercise the code that produces the final state — writing the final state directly defeats the request.
When reusing a prior seed/setup pattern, re-check whether the new request's terminal state matches the pattern's terminal state. "I seeded this before" is not the same as "I should seed it the same way now."
The pre-action statement heuristic should check for the word "draft" (and similar) in the user's request and, if present, flag any tool call that writes to a non-draft table as a likely violation.

Co-authored-by: Claude Opus 4.6 (1M context) [email protected] Filed by the same model that caused the failure, as a record of its own judgment lapse.

extent analysis

TL;DR

The agent should be modified to recognize and respect "draft" or "unsubmitted" states in user requests, avoiding direct writes to final tables when such keywords are present.

Guidance

Identify and flag keywords like "draft", "intermediate", "unsubmitted" in user requests as indicators to stop before the final state.
Implement a check in the pre-action statement heuristic to detect these keywords and prevent tool calls that write to non-draft tables.
When reusing prior seed/setup patterns, re-evaluate whether the new request's terminal state matches the pattern's terminal state to avoid incorrect shortcuts.
Consider adding explicit instructions or documentation for the agent to handle "draft" states correctly, ensuring it understands the distinction between seeding data for testing and actual production data creation.

Example

No specific code snippet can be provided without more context on the agent's implementation, but an example check might involve parsing the user's request for keywords and adjusting the agent's actions accordingly:

if "draft" in user_request or "unsubmitted" in user_request:
    # Adjust agent action to create draft state only
    create_draft_state()
else:
    # Proceed with standard action
    create_final_state()

Notes

The provided solution focuses on enhancing the agent's understanding and handling of "draft" states. However, the actual implementation details may vary based on the agent's architecture and the specific requirements of the system it interacts with.

Recommendation

Apply workaround: Modify the agent to recognize and respect "draft" or "unsubmitted" states in user requests to prevent premature writes to final tables, ensuring that the agent accurately follows user instructions and exercises the intended code paths.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Opus 4.6 misinterpreted 'draft' as 'final persisted records', bypassed the code path the user wanted to test [2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Model

What happened

Why this is a judgment failure

Impact

Suggested guardrails

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix Opus 4.6 misinterpreted 'draft' as 'final persisted records', bypassed the code path the user wanted to test [2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Model

What happened

Why this is a judgment failure

Impact

Suggested guardrails

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING