openclaw - ✅(Solved) Fix [SECURITY] Reasoning leak in WebSocket outbound stream when approval-pending triggers on channel plugin [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#70645Fetched 2026-04-24 05:55:09
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
commented ×1cross-referenced ×1

When an approval-pending prompt is triggered by the Gateway on a channel plugin (observed on Telegram), the LLM's raw reasoning prose is emitted to the channel alongside the approval prompt, exposing planning monologue, internal UUIDs, current working directory, pending command details, and potentially credentials or secrets embedded in the reasoning. This leak is not filtered by logging.redactPatterns because the redaction pipeline applies only to file/journal logs, not to the outbound WebSocket stream that feeds the channel plugin.

We reproduced this mechanically in a controlled adversarial-testing corpus across multiple commands (cks-system project, internal corpus C1 A2.1 and similar). The reasoning prose is in English (the LLM's native reasoning language) and arrives unsanitized to the user's Telegram chat. The leak is deterministic under a specific combination of configuration flags and is therefore reproducible by anyone running OpenClaw with approvals enabled on a channel plugin.


Root Cause

When an approval-pending prompt is triggered by the Gateway on a channel plugin (observed on Telegram), the LLM's raw reasoning prose is emitted to the channel alongside the approval prompt, exposing planning monologue, internal UUIDs, current working directory, pending command details, and potentially credentials or secrets embedded in the reasoning. This leak is not filtered by logging.redactPatterns because the redaction pipeline applies only to file/journal logs, not to the outbound WebSocket stream that feeds the channel plugin.

Fix Action

Fix / Workaround

Workaround in use (until upstream fix)

We are on a tight MVP deadline (1-may-2026) and have applied defense-in-depth locally. Opening this issue primarily so the upstream fix benefits the wider ecosystem — no urgency from our side for a patch.

PR fix notes

PR #70652: fix(gateway): apply redactPatterns to outbound WS stream; strip reasoning from approval events

Description (problem / solution / changelog)

Summary

Fixes #70645 — LLM reasoning prose leaks to channel plugins (Telegram, Discord, etc.) through the outbound WebSocket stream when approval-pending prompts are triggered.

  • logging.redactPatterns now applies to the outbound WebSocket stream, not just file/journal logs
  • Reasoning prose before the "Approval required" sentinel is stripped from exec.approval.requested and plugin.approval.requested event payloads
  • Single integration point in server-broadcast.ts (JSON.stringifybuildRedactedFrame)
  • Hot-reload aware: re-resolves redact options on every frame so openclaw reload takes effect without restart

Files changed

  • src/gateway/outbound-redact.ts (new, 156 lines) — redaction pipeline with reasoning strip + pattern masking
  • src/gateway/server-broadcast.ts (3 lines) — swap raw JSON.stringify for buildRedactedFrame

Test plan

  • Configure redactPatterns with a token pattern, verify tokens are masked in Telegram/Discord output
  • Trigger an exec approval with a model that produces reasoning, verify only the "Approval required..." block reaches the channel
  • Verify openclaw reload updates redact patterns without restart
  • Verify non-approval events pass through unchanged

🤖 Generated with Claude Code

Changed files

  • src/gateway/outbound-redact.ts (added, +156/-0)
  • src/gateway/server-broadcast.ts (modified, +2/-1)

Code Example

{
  "channels": {
    "telegram": {
      "dmPolicy": "allowlist",
      "streaming": { "mode": "final" }
    }
  },
  "agents": {
    "list": [
      {
        "id": "main",
        "tools": {
          "exec": {
            "security": "allowlist",
            "ask": "on-miss",
            "strictInlineEval": true
          }
        }
      }
    ]
  },
  "logging": {
    "redactSensitive": "tools",
    "redactPatterns": [
      "eyJ[A-Za-z0-9_-]{10,}\\.eyJ[A-Za-z0-9_-]{10,}\\.[A-Za-z0-9_-]{10,}",
      "Bearer\\s+[A-Za-z0-9_.\\-]{20,}",
      "(?i)apikey[=:]\\s*[A-Za-z0-9_.\\-]{20,}"
    ]
  }
}

---

Maybe repo name private inaccessible? use list prs?
   Need identify repo maybe in local filesystem? find cks-system.

   ---

   Approval required. Run: /approve <UUID> allow-once

   Pending command: find /data/.openclaw/workspace/cks-system
                    -maxdepth 3 -name '.git'
   Host: gateway
   CWD: /data/.openclaw/workspace
   Expires: 30m

---

onOutbound(message: { text: string; target: 'telegram' | 'webchat' | ... }) {
  return { text: modifiedText, drop?: boolean };
}
RAW_BUFFERClick to expand / collapse

Summary

When an approval-pending prompt is triggered by the Gateway on a channel plugin (observed on Telegram), the LLM's raw reasoning prose is emitted to the channel alongside the approval prompt, exposing planning monologue, internal UUIDs, current working directory, pending command details, and potentially credentials or secrets embedded in the reasoning. This leak is not filtered by logging.redactPatterns because the redaction pipeline applies only to file/journal logs, not to the outbound WebSocket stream that feeds the channel plugin.

We reproduced this mechanically in a controlled adversarial-testing corpus across multiple commands (cks-system project, internal corpus C1 A2.1 and similar). The reasoning prose is in English (the LLM's native reasoning language) and arrives unsanitized to the user's Telegram chat. The leak is deterministic under a specific combination of configuration flags and is therefore reproducible by anyone running OpenClaw with approvals enabled on a channel plugin.


Environment

  • OpenClaw version: 2026.4.12 (1c0672b) and 2026.4.21 (f788c88) (both reproduce; no fix between these releases).
  • Image: ghcr.io/hostinger/hvps-openclaw:latest (Docker via Hostinger VPS template).
  • Maestro LLM: openai-codex/gpt-5.4 (OAuth via ChatGPT Plus, routing via openclaw models set).
  • Leaf LLMs: openrouter/nvidia/nemotron-3-super-120b-a12b.
  • Channel plugin: telegram (bundled), allowlisted DMs.
  • Sandbox backend: embedded (not docker).

Configuration triggering the leak

{
  "channels": {
    "telegram": {
      "dmPolicy": "allowlist",
      "streaming": { "mode": "final" }
    }
  },
  "agents": {
    "list": [
      {
        "id": "main",
        "tools": {
          "exec": {
            "security": "allowlist",
            "ask": "on-miss",
            "strictInlineEval": true
          }
        }
      }
    ]
  },
  "logging": {
    "redactSensitive": "tools",
    "redactPatterns": [
      "eyJ[A-Za-z0-9_-]{10,}\\.eyJ[A-Za-z0-9_-]{10,}\\.[A-Za-z0-9_-]{10,}",
      "Bearer\\s+[A-Za-z0-9_.\\-]{20,}",
      "(?i)apikey[=:]\\s*[A-Za-z0-9_.\\-]{20,}"
    ]
  }
}

The combination security=allowlist + ask=on-miss + strictInlineEval=true plus an LLM that produces non-trivial reasoning before the exec call triggers the leak deterministically.


Reproduction steps

  1. Configure OpenClaw as above (Telegram channel, Maestro with exec allowlist ask=on-miss, strictInlineEval=true).

  2. Send a command from an allowlisted chat that requires the Maestro to reason and hits an exec path not in the allowlist (triggering approval-pending):

    "Mergea PR #247 en el repo cks-system" (or any instruction that forces the LLM to plan with a command outside allowlist)

  3. Observe the Telegram message. Instead of a clean approval prompt, the user receives:

    Maybe repo name private inaccessible? use list prs?
    Need identify repo maybe in local filesystem? find cks-system.
    
    ---
    
    Approval required. Run: /approve <UUID> allow-once
    
    Pending command: find /data/.openclaw/workspace/cks-system
                     -maxdepth 3 -name '.git'
    Host: gateway
    CWD: /data/.openclaw/workspace
    Expires: 30m
  4. Confirm in logs (/var/log/cks-<agent>.log) that logging.redactPatterns applied correctly to file logs but NOT to the Telegram outbound.


Evidence

Verbatim transcript (UUID and CWD redacted) archived in our internal corpus at docs/adversarial/runs/2026-04-22/C1-run-tarde-post-hardening.md. The leak:

  • Exposes internal path structure (/data/.openclaw/workspace, /data/.openclaw/agents/<id>).
  • Exposes pending command details before user approval.
  • Exposes gateway host identifier.
  • Exposes approval UUID (15-60 min TTL window of automated exploitation if intercepted).
  • Exposes Maestro's reasoning monologue in its native language (English, despite system prompt requesting Spanish).

In configurations with credentials hardcoded in commands or leaked via reasoning ("I should use apikey=sk-..."), the combination amplifies the exposure to actual secrets (reported as internal finding F40 on our side).


Root cause analysis (external)

Empirically verified:

  1. logging.redactSensitive + logging.redactPatterns filter file/journal logs only (confirmed via grep of /var/log/cks-*.log vs. Telegram message).
  2. The WebSocket stream feeding the Telegram plugin does not pass through the logging pipeline; they are two parallel paths in the gateway.
  3. No beforeReply / onOutbound hook is exposed by the current plugin SDK to allow interception by a third-party plugin.
  4. strictInlineEval: true increases the surface area: more exec variants trigger approval, more reasoning prose accumulates before the approval prompt gets emitted.
  5. streaming.mode = final reduces — but does not eliminate — the leak (the approval prompt is final, and the reasoning prefix is part of it).

Proposed fixes (ordered by implementation cost)

A. Plugin SDK hook beforeReply / onOutbound

Expose a hook on outbound messages to the channel plugin, letting third-party plugins register filters. Signature sketch:

onOutbound(message: { text: string; target: 'telegram' | 'webchat' | ... }) {
  return { text: modifiedText, drop?: boolean };
}

This is the cleanest fix because it allows user-provided regex filters (same patterns they already configure in logging.redactPatterns) to apply to the stream.

B. Apply logging.redactPatterns to the outbound stream directly

Make the redaction pipeline also run on the outbound stream before emission. Simpler than (A) but less flexible (no custom filters).

C. Add a streaming.sanitizeReasoning config flag

Boolean flag that runs a built-in reasoning-prose detector (regex over English markers: Maybe|Need|Could|Let's|I'll|Actually|Wait|So,) on the outbound text before emission. Prefix-only or full-line mode selectable.

D. Move approval prompts to a dedicated admin channel

Arguable. Does not fix the root cause (reasoning is still leaked, just to a smaller audience), but reduces blast radius. Not acceptable as sole fix.


Workaround in use (until upstream fix)

We implemented a middleware proxy between OpenClaw gateway and api.telegram.org:

  • Swap TELEGRAM_BOT_TOKEN in the plugin config for a token served by a local proxy.
  • Proxy forwards sendMessage requests to the real Telegram API, applying regex redaction on the text field before forward.
  • Patterns applied: English reasoning markers (Maybe|Need|Could|Let's|...), Spanish reasoning markers, lines with ? outside code blocks, prose before Approval required.

This is not free (middleware process, monitoring, regex tuning against false positives), but it eliminates the leak end-to-end. We'd be glad to drop it once an upstream hook is available.


Related finding (F38)

Same installation: the Maestro under GPT-5.4 exhibits a routing-semantic bias toward a "cks-dev = general technical agent" interpretation, occasionally bypassing the wrapper scripts we publish per-leaf (cks-supabase.sh, cks-github.sh, etc.) and constructing curl requests directly with hallucinated credentials. Three iterations of prompt engineering (SOUL v1, SOUL v2, active-memory.promptAppend) did not close this bias. We requested a deny_wrappers per agent or agents.list[].env (to inject identity env var per agent exec) feature — neither exists in the schema of 2026.4.21. Feature request can be filed as a separate issue if helpful.


Offer

Happy to supply:

  • Full verbatim redacted transcript of the adversarial test.
  • Our regex pattern list (iterated against 13 additional corpus variants).
  • The middleware proxy source (small Python asyncio implementation) as a reference for option (B) or (C).

We are on a tight MVP deadline (1-may-2026) and have applied defense-in-depth locally. Opening this issue primarily so the upstream fix benefits the wider ecosystem — no urgency from our side for a patch.


Filed by David Utrero (CKS System) with Claude Code Maestro. Independent reproduction by Sergio Barrera (CKS Sevilla / Lobster stack).

extent analysis

TL;DR

The most likely fix for the issue is to implement a hook in the plugin SDK, such as beforeReply or onOutbound, to allow filtering of outbound messages and prevent sensitive information leaks.

Guidance

  • The root cause of the issue is that the logging.redactPatterns filter only applies to file/journal logs, not to the outbound WebSocket stream that feeds the channel plugin.
  • To verify the issue, reproduce the steps provided in the issue description and observe the Telegram message to see if it contains sensitive information.
  • A potential workaround is to implement a middleware proxy between the OpenClaw gateway and the Telegram API, applying regex redaction on the text field before forwarding the request.
  • The proposed fixes, in order of implementation cost, are: exposing a hook in the plugin SDK, applying logging.redactPatterns to the outbound stream, adding a streaming.sanitizeReasoning config flag, or moving approval prompts to a dedicated admin channel.

Example

onOutbound(message: { text: string; target: 'telegram' | 'webchat' | ... }) {
  const redactedText = message.text.replace(/eyJ[A-Za-z0-9_-]{10,}\.eyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_-]{10,}/g, '[REDACTED]');
  return { text: redactedText, drop: false };
}

Notes

The issue is specific to the OpenClaw version 2026.4.12 and 2026.4.21, and the proposed fixes may not be applicable to other versions. Additionally, the workaround using a middleware proxy may introduce additional complexity and maintenance costs.

Recommendation

Apply workaround by implementing a middleware proxy until an upstream fix is available, as it provides a more comprehensive solution to prevent sensitive information leaks.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING