claude-code - 💡(How to fix) Fix [BUG] model anchors on dangerouslyDisableSandbox: true and keeps applying it to unrelated read-only commands [1 comments, 2 participants]

Error Message

READONLY_NOARGS, READONLY_EXACT, COMMAND_ALLOWLIST in src/tools/BashTool/readOnlyValidation.ts, plus the git/gh/docker read-only subcommand tables), refuse with a clear tool-result error: "This command 3. Surface anchoring detection. If the model has set the bypass on N consecutive calls without the previous one's output showing a sandbox-related error, log a warning back as a tool-result preamble:

Error Messages/Logs

Root Cause

The model-behavior side ("don't anchor") is hard to fix model-side because anchoring is statistical, not logical. Memory-based user corrections persist some of this across sessions but not within
them. Fix #1 above is the cheapest reliable mitigation.
- Cross-reference: this is closely related to the broader question of how dangerouslyDisableSandbox should compose with autoAllowBashIfSandboxed. They feel like they should interact but currently the
  model's bypass silently overrides the user's silence preference.

Fix Action

Fix / Workaround

The model-behavior side ("don't anchor") is hard to fix model-side because anchoring is statistical, not logical. Memory-based user corrections persist some of this across sessions but not within
them. Fix #1 above is the cheapest reliable mitigation.
- Cross-reference: this is closely related to the broader question of how dangerouslyDisableSandbox should compose with autoAllowBashIfSandboxed. They feel like they should interact but currently the
  model's bypass silently overrides the user's silence preference.

Preflight Checklist

I have searched existing issues and this hasn't been reported yet
This is a single bug report (please file separate reports for different bugs)
I am using the latest version of Claude Code

What's Wrong?

Summary

Once Claude legitimately uses dangerouslyDisableSandbox: true for a single command (e.g. a git operation that needs to write to ~/.git-credentials), it tends to keep setting the flag on subsequent,
unrelated, plainly read-only Bash calls (wc, tail, cat, python3 -c-wrapped inspection). Each of those triggers a permission prompt — including in sessions where the user has explicitly opted into the sandbox via "sandbox": { "enabled": true, "autoAllowBashIfSandboxed": true } precisely to avoid prompts.

End state: the user opts into sandboxing to silence prompts; the model bypasses the sandbox; the prompts return.

Reproduction

In the same session, in order:

Legitimate use — model runs git fetch origin (sandbox blocks credential-store lock write), retries with dangerouslyDisableSandbox: true. Works.
Anchored misuse — within the next several minutes, model runs: wc -l "$TMPDIR/l2_timings.csv" && tail -5 "$TMPDIR/l2_timings.csv" && tail -3 "$TMPDIR/l2_run.log"
…with dangerouslyDisableSandbox: true set, despite none of wc/tail writing anywhere or hitting a non-allowed host. User is prompted; user is annoyed.
After explicit user correction ("never bypass sandbox for reads"), the model retries without the flag — but the new command happens to embed python3 -c "..." inside a compound, which is correctly
not auto-allowed, so the user is prompted again. Two layers of friction stacked.

Why this matters

The system prompt clearly says:

▎ You should always default to running commands within the sandbox. Do NOT attempt to set dangerouslyDisableSandbox: true unless the user explicitly asks to bypass sandbox or a specific command just
▎ failed and you see evidence of sandbox restrictions causing the failure.

This is the right policy — but model behavior drifts from it under anchoring pressure. Memory-based corrections from the user help next session, but the failure mode recurs intra-session whenever an
early command needs the bypass.

The combination "autoAllowBashIfSandboxed: true user setting + model-applied bypass" defeats the whole point of opting into sandbox mode.

Proposed harness-side fixes

These don't depend on the model getting better; they make it harder for the model to be wrong.

Reject the bypass for self-evidently read-only commands. When dangerouslyDisableSandbox: true is set on a Bash invocation whose parse is known-read-only (covered by READONLY_COMMANDS,
READONLY_NOARGS, READONLY_EXACT, COMMAND_ALLOWLIST in src/tools/BashTool/readOnlyValidation.ts, plus the git/gh/docker read-only subcommand tables), refuse with a clear tool-result error: "This command is read-only; sandbox bypass is unnecessary. Retrying without the flag should succeed." Forces a no-flag retry.
One-shot bypass semantics. Even when granted, the bypass auto-resets after the call. The model has to re-justify it each time. Removes the anchoring substrate entirely.
Surface anchoring detection. If the model has set the bypass on N consecutive calls without the previous one's output showing a sandbox-related error, log a warning back as a tool-result preamble:
"Note: previous N calls used dangerouslyDisableSandbox; the last K of those produced no sandbox-related errors. Consider running without the bypass." Cheap, non-blocking, observable.
autoAllowBashIfSandboxed honoured even with bypass-refused calls. A bypass-refused call's read-only retry should also benefit from the existing auto-allow path. Otherwise users who opted into
sandbox mode for prompt-silence still get prompts on the model's read-only retries.

Environment

Claude Code (CLI), session-level
Project .claude/settings.json includes: { "sandbox": { "enabled": true,
"autoAllowBashIfSandboxed": true }
}
Linux, sandbox config restricts writes outside an allowlist that includes /tmp/claude-<uid>/ but not ~/.git-credentials or .git/config.

Notes

The model-behavior side ("don't anchor") is hard to fix model-side because anchoring is statistical, not logical. Memory-based user corrections persist some of this across sessions but not within
them. Fix #1 above is the cheapest reliable mitigation.
Cross-reference: this is closely related to the broader question of how dangerouslyDisableSandbox should compose with autoAllowBashIfSandboxed. They feel like they should interact but currently the
model's bypass silently overrides the user's silence preference.

What Should Happen?

It should be fixed.

Error Messages/Logs

Steps to Reproduce

See above

Claude Model

None

Is this a regression?

Yes, this worked in a previous version

Last Working Version

No response

Claude Code Version

2.1.123

Platform

Anthropic API

Operating System

Ubuntu/Debian Linux

Terminal/Shell

Terminal.app (macOS)

Additional Information

No response

extent analysis

TL;DR

Reject the bypass for self-evidently read-only commands to prevent unnecessary sandbox bypasses.

Guidance

Implement a check to refuse the dangerouslyDisableSandbox: true flag for read-only commands, forcing a retry without the flag.
Consider introducing one-shot bypass semantics to auto-reset the bypass after each call, requiring the model to re-justify it each time.
Surface anchoring detection by logging a warning when the model sets the bypass on consecutive calls without sandbox-related errors.
Ensure autoAllowBashIfSandboxed is honored even with bypass-refused calls to prevent unnecessary prompts.

Example

No code snippet is provided as it is not explicitly supported by the issue, but the readOnlyValidation.ts file in src/tools/BashTool can be modified to include the proposed checks.

Notes

The issue is closely related to the interaction between dangerouslyDisableSandbox and autoAllowBashIfSandboxed. The proposed fixes aim to mitigate the model's anchoring behavior without relying on model-side changes.

Recommendation

Apply the proposed harness-side fixes, starting with rejecting the bypass for self-evidently read-only commands, to prevent unnecessary sandbox bypasses and reduce user prompts.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [BUG] model anchors on dangerouslyDisableSandbox: true and keeps applying it to unrelated read-only commands [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Root Cause

Fix Action

Fix / Workaround

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [BUG] model anchors on dangerouslyDisableSandbox: true and keeps applying it to unrelated read-only commands [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Root Cause

Fix Action

Fix / Workaround

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING