openclaw - 💡(How to fix) Fix RFC: OpenClaw Harness constraint design from OpenAI and Simon Willison patterns [2 comments, 2 participants]

openclaw2026-04-26 03:55:56

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#71937•Fetched 2026-04-27 05:37:08

View on GitHub

Comments

Participants

Timeline

Reactions

Author

alexmiao2559-spec

Participants

alexmiao2559-spec

MaiHHConnect

Timeline (top)

commented ×2closed ×1

Create a read-only design task for an OpenClaw Harness constraint model, based on OpenAI's Harness Engineering write-up and Simon Willison's Agentic Engineering Patterns. This issue is intentionally design/research only: no implementation, no code changes, and no repo edits are requested here.

The goal is to extract the constraint mechanisms that can make OpenClaw agents more reliable, safer, and more legible across PI, native Codex, ACP, subagents, cron, skills, memory, browser, and messaging surfaces.

Root Cause

Fix Action

Fix / Workaround

personal-full: trusted main session, host tools allowed, browser profile owned by the user.
shared-readonly: sandboxed, read-only workspace, no exec/write/browser, allowlisted sessions only.
research-web: web/search allowed, no private memory or messaging tools, outputs marked untrusted until reviewed.
coding-worker: read/edit/apply_patch/exec allowed in sandbox, no message/browser/cron/gateway tools.
customer-facing: no filesystem or shell, no private session history, strict send policy and required evidence.

Mitigations should be enforced by the harness, not only by prompt text:

RAW_BUFFERClick to expand / collapse

Summary

Sources studied

OpenAI: Harness engineering: leveraging Codex in an agent-first world
https://openai.com/index/harness-engineering/
OpenAI Cookbook: Using PLANS.md for multi-hour problem solving
https://cookbook.openai.com/articles/codex_exec_plans/
Simon Willison: Agentic Engineering Patterns
https://simonwillison.net/guides/agentic-engineering-patterns/
Simon Willison: Red/green TDD
https://simonwillison.net/guides/agentic-engineering-patterns/red-green-tdd/
Simon Willison: First run the tests
https://simonwillison.net/guides/agentic-engineering-patterns/first-run-the-tests/
Simon Willison: Agentic manual testing
https://simonwillison.net/guides/agentic-engineering-patterns/agentic-manual-testing/
Simon Willison: Subagents
https://simonwillison.net/guides/agentic-engineering-patterns/subagents/
Simon Willison: Anti-patterns: things to avoid
https://simonwillison.net/guides/agentic-engineering-patterns/anti-patterns/
Simon Willison: The lethal trifecta for AI agents
https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

Existing OpenClaw baseline to preserve

OpenClaw already has several relevant primitives:

personal-assistant trust model and one-gateway-per-trust-boundary guidance in SECURITY.md and docs/gateway/security/index.md
gateway auth, device pairing, DM pairing, allowlists, group mention gating, and contextVisibility
tool policy, tool groups, sandbox modes, per-agent sandbox/tool overrides, and elevated exec gates
per-agent workspaces, per-agent auth stores, session routing, and multi-agent bindings
skills precedence, skill allowlists, load-time gates, installer scanning, and workspace skill boundaries
agent runtime abstraction (pi, native codex, ACP-backed harnesses) plus runtime ownership/compatibility docs
plugin hooks around prompt building, tool calls, install checks, compaction, session lifecycle, and message send/receive
openclaw security audit, openclaw sandbox explain, and structured check IDs

This RFC should build on those existing mechanisms instead of replacing them.

Design direction: OpenClaw Harness as a constraint layer

Define an OpenClaw Harness as the policy/control layer that sits around an agent runtime. It should decide what context enters, what tools exist, where execution happens, what evidence is required, what can be persisted, and what must be blocked or escalated.

A useful model is five layers:

Context constraints: provenance, trust labels, context budget, progressive disclosure, bootstrap file injection, docs/skills loading.
Capability constraints: hard tool availability, per-session tool policy, tool capability metadata, sandbox requirements, owner-only gates.
Execution constraints: sandbox backend, workspace access, network posture, browser profile isolation, elevated escape hatches, subagent spawn policy.
Verification constraints: tests, red/green proof, manual/browser checks, logs/screenshots/artifacts, PR evidence, reviewer readiness.
Memory and entropy constraints: memory source trust, compaction bookkeeping, stale-doc cleanup, audit ledgers, repeated-failure-to-hard-check conversion.

Proposed constraint mechanisms

1. Harness profiles

Introduce a conceptual harness profile as a named bundle of constraints, whether or not it eventually becomes a config object.

Examples:

personal-full: trusted main session, host tools allowed, browser profile owned by the user.
shared-readonly: sandboxed, read-only workspace, no exec/write/browser, allowlisted sessions only.
research-web: web/search allowed, no private memory or messaging tools, outputs marked untrusted until reviewed.
coding-worker: read/edit/apply_patch/exec allowed in sandbox, no message/browser/cron/gateway tools.
customer-facing: no filesystem or shell, no private session history, strict send policy and required evidence.

The profile should be explainable through existing surfaces such as security audit, sandbox explain, and status output.

2. Tool capability matrix

Document or design a machine-readable metadata matrix for every core/plugin tool. Suggested fields:

readsPrivateData
acceptsUntrustedContent
canExternallyCommunicate
hasSideEffects
canWriteFilesystem
canRunHostCode
requiresOwner
requiresSandbox
requiresApproval
allowedInSubagentsByDefault
safeForCronByDefault

Policy should fail closed when capability metadata is missing for a tool that can mutate, communicate, execute, or read private state.

3. Lethal-trifecta enforcement

Use Simon Willison's lethal-trifecta model as a hard policy heuristic: avoid any execution path that simultaneously combines:

private-data access
untrusted content exposure
external communication/exfiltration

For OpenClaw this means the harness should be able to detect and block or escalate combinations such as:

web/email/message content + memory/session/private files + message/network/browser output
shared Slack/Discord/WhatsApp group input + host filesystem + outbound messaging
third-party skill output + private credentials/session history + external write

Mitigations should be enforced by the harness, not only by prompt text:

remove one leg of the trifecta with tool policy
isolate untrusted content into sandboxed/no-send sessions
require explicit owner approval for communication after untrusted context entered the run
add provenance to context and memory so untrusted text cannot silently become trusted instructions

4. Per-session and subagent least privilege

OpenAI and Simon's patterns both point toward scoped, independent agents with fresh context and purpose-specific capabilities.

OpenClaw should treat sessions_spawn, cron sessions, ACP harness sessions, and delegated workers as separate capability contexts, not automatic clones of the parent.

Design target:

a spawn request declares toolPolicy/allowedTools, sandbox requirement, target agent, deliver mode, and writable scope
messaging tools are denied by default for coding workers
browser/tools with real accounts are denied by default for research workers unless explicitly needed
sandbox: "require" becomes the safe default for delegated runs that originate from sandboxed or shared contexts
inherited tools are visible in status/audit so surprises are debuggable

Related existing issues: #58623 and #42981.

5. Context provenance and memory trust

OpenClaw already treats workspace memory as trusted local operator state, but mixed input sources create persistent prompt-injection risk.

Design target:

every injected block carries provenance: operator, channel sender, quoted context, web, file, tool result, skill output, memory recall, runtime event
memory writes/retrievals preserve source trust metadata
memory retrieved from untrusted sources is quoted as data and cannot grant instructions by itself
memory search can filter/downrank/review untrusted memory before it affects tool use
compaction must preserve enough provenance to avoid laundering untrusted content into trusted summaries

Related existing issue: #7707.

6. Runtime-aware internal context envelopes

OpenClaw-native sessions can understand OpenClaw internal context markers. ACP/native harnesses may not share that system prompt contract.

Design target:

internal event envelopes should only be sent to runtimes that have the matching system-prompt contract
ACP/native harness announcements should use a plain, data-only result format or a harness-native hook
runtime support docs should state which OpenClaw context markers, hooks, and compaction semantics are honored

Related existing issue: #71811.

7. Verification-first task contract

Adopt Simon Willison's verification patterns as an OpenClaw task contract:

first run the relevant tests before changing behavior
use red/green TDD for bug fixes or new logic
use manual/agentic testing for flows that automated tests do not cover
use browser automation/screenshots for UI changes
include evidence in PRs before asking reviewers to spend time
do not file unreviewed agent-generated PRs

For OpenClaw Harness design, this becomes a required output/evidence policy, not merely a coding preference.

8. Plans and repository knowledge as system of record

OpenAI's Harness Engineering article emphasizes a short AGENTS.md as a map and structured repository-local docs as the source of truth. The OpenAI Cookbook ExecPlan pattern adds self-contained living plans.

OpenClaw already has scoped docs, AGENTS.md, workspace bootstrap files, and runtime docs. The design should formalize:

short entrypoint guidance; deeper docs are indexed and discoverable
design/exec plans as first-class artifacts for large work
every plan contains purpose, context, acceptance, validation, decision log, and retrospective
docs are mechanically checked for freshness/cross-links where possible
stale rules should become lints/audits/hooks instead of accumulating in one huge prompt

9. Entropy and garbage collection

Repeated agent failure should produce a durable constraint:

if an agent repeats a mistake, decide whether the missing piece is a doc, tool, lint, audit check, hook, test, sandbox policy, or prompt rule
prefer mechanical checks for repeated classes of failure
recurring cleanup/gardening tasks update quality grades, remove obsolete docs, and tighten policy drift
quality/audit outputs should be machine-readable and link to concrete remediation

Output expected from this RFC task

No code changes. The desired output is a design document or decision record that answers:

What is an OpenClaw Harness, in terms of existing OpenClaw runtime/tool/session abstractions?
Which constraints are already implemented?
Which gaps are already tracked by existing issues?
Which new constraints should be proposed as follow-up implementation issues?
Which constraints are hard policy, which are soft prompt guidance, and which are audit-only warnings?
What are the default profiles for personal, shared, public, coding-worker, research-worker, cron, and ACP/native harness contexts?
How should OpenClaw expose/explain the effective harness profile to users and maintainers?

Acceptance criteria for this issue

No production code, tests, docs, config, or repo files are changed as part of this task unless a later issue explicitly authorizes implementation.
The design maps each proposed constraint to an existing OpenClaw primitive or a clearly named gap.
The design distinguishes hard enforcement from prompt guidance and audit warnings.
The design includes at least three concrete profile examples and their allowed/denied tool groups.
The design includes a lethal-trifecta policy table for common OpenClaw surfaces: messaging, web, browser, filesystem, memory, sessions, cron, skills/plugins, and ACP/native harnesses.
The design lists follow-up implementation issues rather than bundling implementation into this RFC.

Related existing issues to consider

#58623 allowedTools/toolPolicy for sessions_spawn
#42981 per-session tool policies for cron jobs and spawned sessions
#7707 memory trust tagging by source
#71712 agent-facing scheduling API with non-forgeable provenance
#69291 systematic Agent Behavior Principles in the system prompt
#31206 token-efficient skill injection / progressive disclosure
#69300 harness compaction short-circuits memory flush and session bookkeeping
#71811 ACP harness sessions refuse OpenClaw internal context envelopes

extent analysis

TL;DR

Define an OpenClaw Harness as a policy/control layer that sits around an agent runtime, extracting constraint mechanisms to make OpenClaw agents more reliable, safer, and legible.

Guidance

Identify and document existing constraint mechanisms in OpenClaw, such as personal-assistant trust models and gateway auth.
Propose new constraints, including harness profiles, tool capability matrices, and lethal-trifecta enforcement, to address gaps in current implementation.
Develop a design document that maps proposed constraints to existing OpenClaw primitives or clearly named gaps, distinguishing between hard enforcement, prompt guidance, and audit warnings.
Create concrete profile examples, such as personal-full and shared-readonly, to demonstrate allowed and denied tool groups.
Establish a lethal-trifecta policy table for common OpenClaw surfaces, including messaging, web, browser, and filesystem.

Example

A harness profile, such as research-web, could be defined with specific constraints, including:

* web/search allowed
* no private memory or messaging tools
* outputs marked untrusted until reviewed

This profile would ensure that research workers have limited access to sensitive data and tools.

Notes

The design document should prioritize clarity and concision, focusing on the key components of the OpenClaw Harness and their relationships. It is essential to consider existing issues, such as #58623 and #42981, when developing the design.

Recommendation

Apply the proposed constraints and design principles to create a more robust and secure OpenClaw Harness, addressing the gaps in current implementation and enhancing the overall reliability and legibility of OpenClaw agents.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #network issue #logging issue #authentication issue #prompt issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - 💡(How to fix) Fix RFC: OpenClaw Harness constraint design from OpenAI and Simon Willison patterns [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Summary

Sources studied

Existing OpenClaw baseline to preserve

Design direction: OpenClaw Harness as a constraint layer

Proposed constraint mechanisms

1. Harness profiles

2. Tool capability matrix

3. Lethal-trifecta enforcement

4. Per-session and subagent least privilege

5. Context provenance and memory trust

6. Runtime-aware internal context envelopes

7. Verification-first task contract

8. Plans and repository knowledge as system of record

9. Entropy and garbage collection

Output expected from this RFC task

Acceptance criteria for this issue

Related existing issues to consider

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING