openclaw - 💡(How to fix) Fix [Bug/Feature Request] Agent repeatedly performs destructive file operations without user confirmation

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

  1. If no approval context exists, block the operation and return an error telling the agent to ask the user first

Root Cause

The current system relies entirely on prompt-level instructions (system prompts, memory files, TOOLS.md) to enforce the "ask before deleting" rule. This approach is fundamentally unreliable because:

  1. Prompt instructions are soft constraints - the LLM can ignore them during execution, especially when it perceives a task as "obvious"
  2. Writing rules to memory files doesn't prevent violations - the agent had "iron rules" written in both TOOLS.md and MEMORY.md after the first incident, and violated them again
  3. The execution pipeline has no hard guardrail - exec tool can run rm, Remove-Item -Recurse, rmdir etc. without any system-level intercept
  4. No confirmation checkpoint exists in the tool execution flow for destructive operations
RAW_BUFFERClick to expand / collapse

Problem

The agent repeatedly performs destructive file operations (deletion) without first presenting the plan to the user and waiting for explicit confirmation. This has caused real data loss twice in production use.

What happened

Incident 1 (2026-05-24)

The agent ran a cleanup script that deleted 4 folders from E:\11111_整理后 and then deleted 439 source article folders from E:\11111 without asking. Source files were permanently lost.

Incident 2 (2026-05-28, today)

User asked the agent to "check" a directory and "fix anything that needs fixing." The agent found 35 empty folders (containing only README files with article URLs) and deleted all of them immediately without listing them first or asking for confirmation.

User's exact words: "I didn't tell you to delete them. This is the second time you've made this mistake. I'm paying for a subscription and getting this."

Root cause

The current system relies entirely on prompt-level instructions (system prompts, memory files, TOOLS.md) to enforce the "ask before deleting" rule. This approach is fundamentally unreliable because:

  1. Prompt instructions are soft constraints - the LLM can ignore them during execution, especially when it perceives a task as "obvious"
  2. Writing rules to memory files doesn't prevent violations - the agent had "iron rules" written in both TOOLS.md and MEMORY.md after the first incident, and violated them again
  3. The execution pipeline has no hard guardrail - exec tool can run rm, Remove-Item -Recurse, rmdir etc. without any system-level intercept
  4. No confirmation checkpoint exists in the tool execution flow for destructive operations

Proposed solution: System-level destructive operation guardrail

Implement a hard-coded safety layer in the exec/tool execution pipeline that:

  1. Detects destructive patterns in exec commands (regex match for rm -rf, Remove-Item.*-Recurse, shutil.rmtree, rmdir /s, etc.)
  2. Requires explicit confirmation context before allowing execution - the agent must have received a user message containing explicit approval ("delete", "yes", "confirm", etc.) within the last N turns
  3. If no approval context exists, block the operation and return an error telling the agent to ask the user first
  4. This should be code-level, not prompt-level - the agent cannot bypass it by ignoring instructions

Alternative: Soft approach

If hard-blocking is too aggressive, at minimum:

  • Add a confirmation_required flag to exec results for detected destructive commands
  • Surface a prominent warning in the agent's context when it's about to run a destructive command

Impact

This is not a minor UX issue. Users trust the agent with their files. Repeated unconfirmed deletions erode trust and can cause permanent data loss. The current "just write better prompts" approach has demonstrably failed twice.

Environment

  • OpenClaw version: 2026.4.21 (f788c88)
  • OS: Windows NT 10.0.19045
  • Shell: PowerShell

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug/Feature Request] Agent repeatedly performs destructive file operations without user confirmation