openclaw - 💡(How to fix) Fix [Feature]: Proposal: Source-Aware Instruction Tracking as architectural mitigation for indirect prompt injection

openclaw2026-05-28 16:50:27

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Add source-aware instruction tagging to the tool execution gate to structurally prevent indirect prompt injection via external content.

Root Cause

Prompt hardening and blocklists — these filter symptoms, not the root cause. A sufficiently crafted injection payload bypasses them. Structural source separation cannot be bypassed by prompt content alone.

RAW_BUFFERClick to expand / collapse

Summary

Add source-aware instruction tagging to the tool execution gate to structurally prevent indirect prompt injection via external content.

Problem to solve

OpenClaw's instruction pipeline treats all text in the context window equivalently — a user instruction and an instruction embedded in a webpage the agent is reading arrive in the same format and are processed the same way. This is the architectural root of indirect prompt injection (the mechanism behind ClawJacked). Filtering approaches address symptoms. Source-aware tracking addresses the architecture.

Proposed solution

type InstructionSource = 'user' | 'agent' | 'external_content';

interface ProposedAction { toolName: string; params: Record<string, unknown>; source: InstructionSource; reasoning: string; }

// At the execution gate: if (action.source === 'external_content') { return { permitted: false, reason: 'external_content_gate' }; }

External content can inform response text. It cannot trigger tool calls.

Alternatives considered

Impact

Affected: All users processing external content (web browsing, email, document reading) Severity: Blocks workflow — demonstrated in ClawJacked attack class Frequency: Any session where the agent reads external content Consequence: Silent credential exfiltration, unauthorized tool execution

I've implemented this in an experimental project called Colors (github.com/thecolourfoundation/Color) with a full research writeup. Happy to contribute this upstream as a PR if there's maintainer interest.

Evidence/examples

Colors implementation: github.com/thecolourfoundation/Color Research doc: github.com/thecolourfoundation/Color/blob/main/RESEARCH.md ClawJacked attack class: documented in CVE-2026-25253 disclosure

Additional information

This change can be implemented incrementally. The execution gate check is additive — existing behavior is unchanged for user and agent sourced actions. Backward compatible by design.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering