openclaw - 💡(How to fix) Fix Agent Undo Stacks for Self-Healing (OpenClaw #50372) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#52690Fetched 2026-04-08 01:20:19
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
1
Author
Participants
Timeline (top)
subscribed ×1

Error Message

Implement undo stacks that record state-changing operations and can revert on error. Error Handling:

  • On exception: pop stack until empty or recovery succeeds
  • Error clustering: group similar errors; if cluster grows, suggest mitigation

Fix Action

Fix / Workaround

  • Automatic retry with exponential backoff (1s, 2s, 4s)
  • After 2 failures (two-strike), disable tool for 5min and AlertPipe
  • Error clustering: group similar errors; if cluster grows, suggest mitigation

Code Example

undo:
  max_depth: 10
  backup_before_write: true   # create .bak files
  dry_run_reversible: false   # if true, only log without executing
RAW_BUFFERClick to expand / collapse

Agent Undo Stacks for Self-Healing (OpenClaw #50372)

Problem

Agent failures often leave partial state (files created, external services altered) that requires manual cleanup. No automatic recovery.

Proposed Solution

Implement undo stacks that record state-changing operations and can revert on error.

Design

Operation Logging:

  • Before any state-changing action:
    • If reversible, record inverse operation to session's undo stack
    • Limit stack depth to 10 (prevent memory bloat)
  • Example:
    • write_file(path, content) → record delete_file(path) on stack
    • mkdir(path) → record rmdir(path) on stack
    • http_post(url, data) → record http_delete(url) if idempotent, else mark as unreversible

Reversibility Criteria:

  • File writes: yes (overwrite backup exists? we should backup first)
  • Directory creation: yes (if empty)
  • External POST/PUT: only if we stored the previous state or can DELETE
  • Sub-agent spawn: kill sub-agent

Error Handling:

  • On exception: pop stack until empty or recovery succeeds
  • Log all undo actions
  • If stack exhausts without recovery, escalate to user

Implementation Location:

  • src/agent/undo_stack.py
  • Integrate into run_tool() wrapper in harness
  • Session-scoped: each session has its own UndoStack

Configuration

~/.openclaw/agent.yaml:

undo:
  max_depth: 10
  backup_before_write: true   # create .bak files
  dry_run_reversible: false   # if true, only log without executing

Self-Healing Patterns

  • Automatic retry with exponential backoff (1s, 2s, 4s)
  • After 2 failures (two-strike), disable tool for 5min and AlertPipe
  • Error clustering: group similar errors; if cluster grows, suggest mitigation

Alternatives Considered

  • Transaction logs: too heavy-weight; undo stack sufficient for interactive agent
  • Snapshot per action: too slow; selective backup (only for critical files)

References

  • Rob Beswick (2025). "Self-Healing Systems in AI Agents"
  • AWS Seasoned Tech (2024). "The Undo Pattern"
  • OpenTelemetry: tracing enables reconstructing state

Related Issues

  • Requires: OpenClaw #50371 (Pulse) for monitoring undo success rate
  • Blocks: OpenClaw #50373 (SovereignLedger) — ledger entries should be undoable

extent analysis

Fix Plan

To implement the proposed solution, follow these steps:

  • Create a new file src/agent/undo_stack.py with the following content:
class UndoStack:
    def __init__(self, max_depth):
        self.max_depth = max_depth
        self.stack = []

    def push(self, operation):
        if len(self.stack) < self.max_depth:
            self.stack.append(operation)

    def pop(self):
        if self.stack:
            return self.stack.pop()
        return None

    def execute_undo(self):
        while self.stack:
            operation = self.pop()
            if operation:
                operation.undo()
  • Define the Operation class with an undo method:
class Operation:
    def __init__(self, undo_func):
        self.undo_func = undo_func

    def undo(self):
        self.undo_func()
  • Create concrete operation classes, such as WriteFileOperation and MkdirOperation:
class WriteFileOperation(Operation):
    def __init__(self, path, content):
        super().__init__(lambda: delete_file(path))
        self.path = path
        self.content = content

class MkdirOperation(Operation):
    def __init__(self, path):
        super().__init__(lambda: rmdir(path))
        self.path = path
  • Integrate the UndoStack into the run_tool() wrapper in the harness:
def run_tool(tool):
    undo_stack = UndoStack(max_depth=10)
    try:
        # Execute tool operations
        write_file_operation = WriteFileOperation('path/to/file', 'content')
        undo_stack.push(write_file_operation)
        # ...
    except Exception as e:
        # Execute undo operations
        undo_stack.execute_undo()
        # Log error and escalate to user if necessary
  • Update the ~/.openclaw/agent.yaml configuration file with the following content:
undo:
  max_depth: 10
  backup_before_write: true
  dry_run_reversible: false

Verification

To verify that the fix worked, test the following scenarios:

  • Successful execution of a tool with reversible operations
  • Failed execution of a tool with reversible operations, followed by successful undo
  • Failed execution of a tool with non-reversible operations, followed by error logging and escalation to user

Extra Tips

  • Make sure to handle edge cases, such as stack overflow and operation failures during undo.
  • Consider adding logging and monitoring to track the success rate of undo operations.
  • Review the undo method implementation for each operation class to ensure correct behavior.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Agent Undo Stacks for Self-Healing (OpenClaw #50372) [1 participants]