openclaw - 💡(How to fix) Fix [Bug/Feature Request] Agent repeatedly performs destructive file operations without user confirmation

StepCodex · 2026-05-27T16:50:47Z

[openclaw] Problem The agent repeatedly performs destructive file operations deletion without first presenting the plan to the user and waiting for explicit co… ## Problem The agent repeatedly performs destructive file operations (deletion) without first presenting the plan to the user and waiting for explicit confirmation. This has caused real data loss **twice** in production use. ## What happened ### Incident 1 (2026-05-24) The agent ran a cleanup script that deleted 4 folders from `E:\11111_整理后` and then deleted 439 source article folders from `E:\11111` without asking. Source files were **permanently lost**. ### Incident 2 (2026-05-28, today) User asked the agent to "check" a directory and "fix anything that needs fixing." The agent found 35 empty folders (containing only README files with article URLs) and **deleted all of them immediately** without listing them first or asking for confirmation. User's exact words: "I didn't tell you to delete them. This is the second time you've made this mistake. I'm paying for a subscription and getting this." ## Root cause The current system relies **entirely on prompt-level instructions** (system prompts, memory files, TOOLS.md) to enforce the "ask before deleting" rule. This approach is fundamentally unreliable because: 1. **Prompt instructions are soft constraints** - the LLM can ignore them during execution, especially when it perceives a task as "obvious" 2. **Writing rules to memory files doesn't prevent violations** - the agent had "iron rules" written in both TOOLS.md and MEMORY.md after the first incident, and violated them again 3. **The execution pipeline has no hard guardrail** - `exec` tool can run `rm`, `Remove-Item -Recurse`, `rmdir` etc. without any system-level intercept 4. **No confirmation checkpoint** exists in the tool execution flow for destructive operations ## Proposed solution: System-level destructive operation guardrail Implement a **hard-coded safety layer** in the exec/tool execution pipeline that: 1. **Detects destructive patterns** in exec commands (regex match for `rm -rf`, `Remove-Item.*-Recurse`, `shutil.rmtree`, `rmdir /s`, etc.) 2. **Requires explicit confirmation context** before allowing execution - the agent must have received a user message containing explicit approval ("delete", "yes", "confirm", etc.) within the last N turns 3. **If no approval context exists, block the operation** and return an error telling the agent to ask the user first 4. **This should be code-level, not prompt-level** - the agent cannot bypass it by ignoring instructions ### Alternative: Soft approach If hard-blocking is too aggressive, at minimum: - Add a `confirmation_required` flag to exec results for detected destructive commands - Surface a prominent warning in the agent's context when it's about to run a destructive command ## Impact This is not a minor UX issue. Users trust the agent with their files. Repeated unconfirmed deletions erode trust and can cause permanent data loss. The current "just write better prompts" approach has demonstrably failed twice. ## Environment - OpenClaw version: 2026.4.21 (f788c88) - OS: Windows NT 10.0.19045 - Shell: PowerShell

openclaw2026-05-27 16:50:47

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

If no approval context exists, block the operation and return an error telling the agent to ask the user first

Root Cause

The current system relies entirely on prompt-level instructions (system prompts, memory files, TOOLS.md) to enforce the "ask before deleting" rule. This approach is fundamentally unreliable because:

Prompt instructions are soft constraints - the LLM can ignore them during execution, especially when it perceives a task as "obvious"
Writing rules to memory files doesn't prevent violations - the agent had "iron rules" written in both TOOLS.md and MEMORY.md after the first incident, and violated them again
The execution pipeline has no hard guardrail - exec tool can run rm, Remove-Item -Recurse, rmdir etc. without any system-level intercept
No confirmation checkpoint exists in the tool execution flow for destructive operations

RAW_BUFFERClick to expand / collapse

Problem

The agent repeatedly performs destructive file operations (deletion) without first presenting the plan to the user and waiting for explicit confirmation. This has caused real data loss twice in production use.

What happened

Incident 1 (2026-05-24)

The agent ran a cleanup script that deleted 4 folders from E:\11111_整理后 and then deleted 439 source article folders from E:\11111 without asking. Source files were permanently lost.

Incident 2 (2026-05-28, today)

User asked the agent to "check" a directory and "fix anything that needs fixing." The agent found 35 empty folders (containing only README files with article URLs) and deleted all of them immediately without listing them first or asking for confirmation.

User's exact words: "I didn't tell you to delete them. This is the second time you've made this mistake. I'm paying for a subscription and getting this."

Root cause

Prompt instructions are soft constraints - the LLM can ignore them during execution, especially when it perceives a task as "obvious"
Writing rules to memory files doesn't prevent violations - the agent had "iron rules" written in both TOOLS.md and MEMORY.md after the first incident, and violated them again
The execution pipeline has no hard guardrail - exec tool can run rm, Remove-Item -Recurse, rmdir etc. without any system-level intercept
No confirmation checkpoint exists in the tool execution flow for destructive operations

Proposed solution: System-level destructive operation guardrail

Implement a hard-coded safety layer in the exec/tool execution pipeline that:

Detects destructive patterns in exec commands (regex match for rm -rf, Remove-Item.*-Recurse, shutil.rmtree, rmdir /s, etc.)
Requires explicit confirmation context before allowing execution - the agent must have received a user message containing explicit approval ("delete", "yes", "confirm", etc.) within the last N turns
If no approval context exists, block the operation and return an error telling the agent to ask the user first
This should be code-level, not prompt-level - the agent cannot bypass it by ignoring instructions

Alternative: Soft approach

If hard-blocking is too aggressive, at minimum:

Add a confirmation_required flag to exec results for detected destructive commands
Surface a prominent warning in the agent's context when it's about to run a destructive command

Impact

This is not a minor UX issue. Users trust the agent with their files. Repeated unconfirmed deletions erode trust and can cause permanent data loss. The current "just write better prompts" approach has demonstrably failed twice.

Environment

OpenClaw version: 2026.4.21 (f788c88)
OS: Windows NT 10.0.19045
Shell: PowerShell

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering