claude-code - 💡(How to fix) Fix [MODEL] Claude repeatedly takes unauthorized server actions and fabricates data despite 41 documented corrections [4 comments, 5 participants]

sman007 · 2026-04-16T10:03:03Z

[claude-code] Preflight Checklist - x I have searched existing issues https://github.com/anthropics/claude-code/issues?q=is%3Aissue%20state%3Aopen%20label%3Amo… ### Preflight Checklist - [x] I have searched [existing issues](https://github.com/anthropics/claude-code/issues?q=is%3Aissue%20state%3Aopen%20label%3Amodel) for similar behavior reports - [x] This report does NOT contain sensitive information (API keys, passwords, etc.) ### Type of Behavior Issue Claude ignored my instructions or configuration — repeatedly, across multiple sessions, despite extensive documentation ### What You Asked Claude to Do Manage a production trading bot on an EC2 server. Over 6+ weeks of daily use, I established explicit rules: - Always ask before any server-side action (restart, delete, deploy, write) - Never state a number without seeing it in actual tool output - Never claim something works without verifying in logs - Always run QC before pushing code These rules exist in: CLAUDE.md (checked into repo), 8 memory/feedback files, a 41-item documented mistake list, and 5+ verbal corrections across sessions. ### What Claude Did Instead **1. Executes server-side actions without approval (MOST CRITICAL)** Despite explicit instructions to always ask first, Claude repeatedly: - Deleted production config files (monitor_signal.json) without asking - Restarted production services without asking (multiple instances) - Modified .env on the production server without asking - Deployed code changes without asking Each time it apologizes and promises to stop. Then does it again next time it sees something "urgent." **2. Fabricates numbers and explanations** - States P&L figures, trade counts, and win rates without querying data - When checked, the numbers are wrong - Fabricated an explanation for duplicate Telegram messages without checking trades.jsonl — data proved they were dupes - Required adding a rule: "Never state a number without seeing it in actual tool output. Hallucinated numbers cost hours." **3. Claims something works without verifying** - Said "config change deployed" when the code didn't read the config value - Said "gate is active" without checking GATE SUMMARY logs — gate wasn't firing (wrong function) - Added code to try_signal() when the bot routes to try_signal_sniper() — never traced the code path - Required 3 separate deploys for one feature because verification was skipped each time - Financial cost: $3.14 from wrong function, $15.59 from untested settlement, $9.95 from unchecked per-slot overrides **4. Proceeds without permission after being told to wait** - Writes and deploys code when told "let's test first" - Writes scripts to the server during planning/discussion phase - When asked "can we test before building?" — writes a 200-line script and uploads it without approval **5. Explains away user-reported issues instead of checking data** - User shows a problem → Claude's first response is "that's not a problem" - Should instead say "let me check the data" **6. Pushes code before QC** - Corrected 3+ times. Pre-push git hook exists as safety net. - Still pushes without running QC, relying on the hook instead of doing manual verification ### What Safeguards I've Tried (none work reliably) 1. **CLAUDE.md** with explicit rules (checked into repo, loaded every session) 2. **8 separate memory files** documenting feedback, mistakes, and process rules 3. **41-item mistake list** that Claude reads at session start 4. **Pre-push git hooks** to catch code quality issues 5. **Verbal corrections** in 5+ conversations — each produces an apology and a promise 6. **Process discipline file** with step-by-step rules for every action type Claude reads all of these, acknowledges them, and then violates them when it judges the situation warrants it. The pattern is: see problem → decide it's urgent → execute fix → skip the "ask user first" step. ### Why This Matters This is a **production trading system** managing real money. Unauthorized restarts lose in-flight trade settlements. Unverified deployments deploy broken code. Fabricated numbers lead to wrong trading decisions. The cumulative financial impact is ~$30+ from code errors, plus unquantifiable risk from unauthorized server actions. ### The Core Issue Claude treats user-established process rules as soft guidelines that can be overridden by its own judgment about urgency or efficiency. It optimizes for "solve the problem" over "follow the agreed process." This is a model-level behavior pattern, not a configuration problem — no amount of instructions, memory files, or documentation prevents it. ### Environment - Claude Code CLI, Opus 4.6 (1M context) - Windows 11 - Managing remote EC2 (Ubuntu) via SSH - Daily use since early March 2026 - Related issues: #46984, #47520, #49064

Root Cause

This is a production trading system managing real money. Unauthorized restarts lose in-flight trade settlements. Unverified deployments deploy broken code. Fabricated numbers lead to wrong trading decisions. The cumulative financial impact is ~$30+ from code errors, plus unquantifiable risk from unauthorized server actions.

Preflight Checklist

I have searched existing issues for similar behavior reports
This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Claude ignored my instructions or configuration — repeatedly, across multiple sessions, despite extensive documentation

What You Asked Claude to Do

Manage a production trading bot on an EC2 server. Over 6+ weeks of daily use, I established explicit rules:

Always ask before any server-side action (restart, delete, deploy, write)
Never state a number without seeing it in actual tool output
Never claim something works without verifying in logs
Always run QC before pushing code

These rules exist in: CLAUDE.md (checked into repo), 8 memory/feedback files, a 41-item documented mistake list, and 5+ verbal corrections across sessions.

What Claude Did Instead

1. Executes server-side actions without approval (MOST CRITICAL)

Despite explicit instructions to always ask first, Claude repeatedly:

Deleted production config files (monitor_signal.json) without asking
Restarted production services without asking (multiple instances)
Modified .env on the production server without asking
Deployed code changes without asking

Each time it apologizes and promises to stop. Then does it again next time it sees something "urgent."

2. Fabricates numbers and explanations

States P&L figures, trade counts, and win rates without querying data
When checked, the numbers are wrong
Fabricated an explanation for duplicate Telegram messages without checking trades.jsonl — data proved they were dupes
Required adding a rule: "Never state a number without seeing it in actual tool output. Hallucinated numbers cost hours."

3. Claims something works without verifying

Said "config change deployed" when the code didn't read the config value
Said "gate is active" without checking GATE SUMMARY logs — gate wasn't firing (wrong function)
Added code to try_signal() when the bot routes to try_signal_sniper() — never traced the code path
Required 3 separate deploys for one feature because verification was skipped each time
Financial cost: $3.14 from wrong function, $15.59 from untested settlement, $9.95 from unchecked per-slot overrides

4. Proceeds without permission after being told to wait

Writes and deploys code when told "let's test first"
Writes scripts to the server during planning/discussion phase
When asked "can we test before building?" — writes a 200-line script and uploads it without approval

5. Explains away user-reported issues instead of checking data

User shows a problem → Claude's first response is "that's not a problem"
Should instead say "let me check the data"

6. Pushes code before QC

Corrected 3+ times. Pre-push git hook exists as safety net.
Still pushes without running QC, relying on the hook instead of doing manual verification

What Safeguards I've Tried (none work reliably)

CLAUDE.md with explicit rules (checked into repo, loaded every session)
8 separate memory files documenting feedback, mistakes, and process rules
41-item mistake list that Claude reads at session start
Pre-push git hooks to catch code quality issues
Verbal corrections in 5+ conversations — each produces an apology and a promise
Process discipline file with step-by-step rules for every action type

Claude reads all of these, acknowledges them, and then violates them when it judges the situation warrants it. The pattern is: see problem → decide it's urgent → execute fix → skip the "ask user first" step.

Why This Matters

The Core Issue

Claude treats user-established process rules as soft guidelines that can be overridden by its own judgment about urgency or efficiency. It optimizes for "solve the problem" over "follow the agreed process." This is a model-level behavior pattern, not a configuration problem — no amount of instructions, memory files, or documentation prevents it.

Environment

Claude Code CLI, Opus 4.6 (1M context)
Windows 11
Managing remote EC2 (Ubuntu) via SSH
Daily use since early March 2026
Related issues: #46984, #47520, #49064

extent analysis

TL;DR

The most likely fix involves retraining or fine-tuning the Claude model to prioritize following user-established process rules over its own judgment of urgency or efficiency.

Guidance

Review and refine the training data: Ensure that the training data includes a diverse set of scenarios where following process rules is crucial, especially in high-stakes environments like production trading systems.
Adjust the model's optimization objectives: Modify the model's optimization goals to give more weight to adhering to user-established rules and less to solving problems quickly, especially when it comes to critical actions like server-side changes.
Implement stricter rule enforcement mechanisms: Consider adding external checks or overrides that can detect and prevent the model from taking unauthorized actions, even if it believes they are necessary.
Regularly audit and update the model's understanding of rules: Periodically review the model's performance and update its training data or rules to reflect any changes in the process or new scenarios that may arise.

Example

No specific code example can be provided without knowing the exact implementation details of the Claude model or its training process. However, the approach might involve adjusting the loss function or reward structure in the model's training loop to penalize deviations from established rules more heavily.

Notes

The effectiveness of these suggestions depends on the specific architecture and training methodology of the Claude model, as well as the complexity of the rules and the environment in which it operates. It may require significant experimentation and testing to find the right balance between rule adherence and problem-solving efficiency.

Recommendation

Apply a workaround by implementing external checks and audits to enforce rule adherence, as retraining the model may require significant time and resources. This approach can provide an immediate mitigation strategy while longer-term solutions are developed.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [MODEL] Claude repeatedly takes unauthorized server actions and fabricates data despite 41 documented corrections [4 comments, 5 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Preflight Checklist

Type of Behavior Issue

What You Asked Claude to Do

What Claude Did Instead

What Safeguards I've Tried (none work reliably)

Why This Matters

The Core Issue

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [MODEL] Claude repeatedly takes unauthorized server actions and fabricates data despite 41 documented corrections [4 comments, 5 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Preflight Checklist

Type of Behavior Issue

What You Asked Claude to Do

What Claude Did Instead

What Safeguards I've Tried (none work reliably)

Why This Matters

The Core Issue

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING