openclaw - 💡(How to fix) Fix RFC: OpenClaw Harness constraint design from OpenAI and Simon Willison patterns [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#71937Fetched 2026-04-27 05:37:08
View on GitHub
Comments
2
Participants
2
Timeline
3
Reactions
0
Timeline (top)
commented ×2closed ×1

Create a read-only design task for an OpenClaw Harness constraint model, based on OpenAI's Harness Engineering write-up and Simon Willison's Agentic Engineering Patterns. This issue is intentionally design/research only: no implementation, no code changes, and no repo edits are requested here.

The goal is to extract the constraint mechanisms that can make OpenClaw agents more reliable, safer, and more legible across PI, native Codex, ACP, subagents, cron, skills, memory, browser, and messaging surfaces.

Root Cause

Create a read-only design task for an OpenClaw Harness constraint model, based on OpenAI's Harness Engineering write-up and Simon Willison's Agentic Engineering Patterns. This issue is intentionally design/research only: no implementation, no code changes, and no repo edits are requested here.

The goal is to extract the constraint mechanisms that can make OpenClaw agents more reliable, safer, and more legible across PI, native Codex, ACP, subagents, cron, skills, memory, browser, and messaging surfaces.

Fix Action

Fix / Workaround

  • personal-full: trusted main session, host tools allowed, browser profile owned by the user.
  • shared-readonly: sandboxed, read-only workspace, no exec/write/browser, allowlisted sessions only.
  • research-web: web/search allowed, no private memory or messaging tools, outputs marked untrusted until reviewed.
  • coding-worker: read/edit/apply_patch/exec allowed in sandbox, no message/browser/cron/gateway tools.
  • customer-facing: no filesystem or shell, no private session history, strict send policy and required evidence.

Mitigations should be enforced by the harness, not only by prompt text:

RAW_BUFFERClick to expand / collapse

Summary

Create a read-only design task for an OpenClaw Harness constraint model, based on OpenAI's Harness Engineering write-up and Simon Willison's Agentic Engineering Patterns. This issue is intentionally design/research only: no implementation, no code changes, and no repo edits are requested here.

The goal is to extract the constraint mechanisms that can make OpenClaw agents more reliable, safer, and more legible across PI, native Codex, ACP, subagents, cron, skills, memory, browser, and messaging surfaces.

Sources studied

Existing OpenClaw baseline to preserve

OpenClaw already has several relevant primitives:

  • personal-assistant trust model and one-gateway-per-trust-boundary guidance in SECURITY.md and docs/gateway/security/index.md
  • gateway auth, device pairing, DM pairing, allowlists, group mention gating, and contextVisibility
  • tool policy, tool groups, sandbox modes, per-agent sandbox/tool overrides, and elevated exec gates
  • per-agent workspaces, per-agent auth stores, session routing, and multi-agent bindings
  • skills precedence, skill allowlists, load-time gates, installer scanning, and workspace skill boundaries
  • agent runtime abstraction (pi, native codex, ACP-backed harnesses) plus runtime ownership/compatibility docs
  • plugin hooks around prompt building, tool calls, install checks, compaction, session lifecycle, and message send/receive
  • openclaw security audit, openclaw sandbox explain, and structured check IDs

This RFC should build on those existing mechanisms instead of replacing them.

Design direction: OpenClaw Harness as a constraint layer

Define an OpenClaw Harness as the policy/control layer that sits around an agent runtime. It should decide what context enters, what tools exist, where execution happens, what evidence is required, what can be persisted, and what must be blocked or escalated.

A useful model is five layers:

  1. Context constraints: provenance, trust labels, context budget, progressive disclosure, bootstrap file injection, docs/skills loading.
  2. Capability constraints: hard tool availability, per-session tool policy, tool capability metadata, sandbox requirements, owner-only gates.
  3. Execution constraints: sandbox backend, workspace access, network posture, browser profile isolation, elevated escape hatches, subagent spawn policy.
  4. Verification constraints: tests, red/green proof, manual/browser checks, logs/screenshots/artifacts, PR evidence, reviewer readiness.
  5. Memory and entropy constraints: memory source trust, compaction bookkeeping, stale-doc cleanup, audit ledgers, repeated-failure-to-hard-check conversion.

Proposed constraint mechanisms

1. Harness profiles

Introduce a conceptual harness profile as a named bundle of constraints, whether or not it eventually becomes a config object.

Examples:

  • personal-full: trusted main session, host tools allowed, browser profile owned by the user.
  • shared-readonly: sandboxed, read-only workspace, no exec/write/browser, allowlisted sessions only.
  • research-web: web/search allowed, no private memory or messaging tools, outputs marked untrusted until reviewed.
  • coding-worker: read/edit/apply_patch/exec allowed in sandbox, no message/browser/cron/gateway tools.
  • customer-facing: no filesystem or shell, no private session history, strict send policy and required evidence.

The profile should be explainable through existing surfaces such as security audit, sandbox explain, and status output.

2. Tool capability matrix

Document or design a machine-readable metadata matrix for every core/plugin tool. Suggested fields:

  • readsPrivateData
  • acceptsUntrustedContent
  • canExternallyCommunicate
  • hasSideEffects
  • canWriteFilesystem
  • canRunHostCode
  • requiresOwner
  • requiresSandbox
  • requiresApproval
  • allowedInSubagentsByDefault
  • safeForCronByDefault

Policy should fail closed when capability metadata is missing for a tool that can mutate, communicate, execute, or read private state.

3. Lethal-trifecta enforcement

Use Simon Willison's lethal-trifecta model as a hard policy heuristic: avoid any execution path that simultaneously combines:

  • private-data access
  • untrusted content exposure
  • external communication/exfiltration

For OpenClaw this means the harness should be able to detect and block or escalate combinations such as:

  • web/email/message content + memory/session/private files + message/network/browser output
  • shared Slack/Discord/WhatsApp group input + host filesystem + outbound messaging
  • third-party skill output + private credentials/session history + external write

Mitigations should be enforced by the harness, not only by prompt text:

  • remove one leg of the trifecta with tool policy
  • isolate untrusted content into sandboxed/no-send sessions
  • require explicit owner approval for communication after untrusted context entered the run
  • add provenance to context and memory so untrusted text cannot silently become trusted instructions

4. Per-session and subagent least privilege

OpenAI and Simon's patterns both point toward scoped, independent agents with fresh context and purpose-specific capabilities.

OpenClaw should treat sessions_spawn, cron sessions, ACP harness sessions, and delegated workers as separate capability contexts, not automatic clones of the parent.

Design target:

  • a spawn request declares toolPolicy/allowedTools, sandbox requirement, target agent, deliver mode, and writable scope
  • messaging tools are denied by default for coding workers
  • browser/tools with real accounts are denied by default for research workers unless explicitly needed
  • sandbox: "require" becomes the safe default for delegated runs that originate from sandboxed or shared contexts
  • inherited tools are visible in status/audit so surprises are debuggable

Related existing issues: #58623 and #42981.

5. Context provenance and memory trust

OpenClaw already treats workspace memory as trusted local operator state, but mixed input sources create persistent prompt-injection risk.

Design target:

  • every injected block carries provenance: operator, channel sender, quoted context, web, file, tool result, skill output, memory recall, runtime event
  • memory writes/retrievals preserve source trust metadata
  • memory retrieved from untrusted sources is quoted as data and cannot grant instructions by itself
  • memory search can filter/downrank/review untrusted memory before it affects tool use
  • compaction must preserve enough provenance to avoid laundering untrusted content into trusted summaries

Related existing issue: #7707.

6. Runtime-aware internal context envelopes

OpenClaw-native sessions can understand OpenClaw internal context markers. ACP/native harnesses may not share that system prompt contract.

Design target:

  • internal event envelopes should only be sent to runtimes that have the matching system-prompt contract
  • ACP/native harness announcements should use a plain, data-only result format or a harness-native hook
  • runtime support docs should state which OpenClaw context markers, hooks, and compaction semantics are honored

Related existing issue: #71811.

7. Verification-first task contract

Adopt Simon Willison's verification patterns as an OpenClaw task contract:

  • first run the relevant tests before changing behavior
  • use red/green TDD for bug fixes or new logic
  • use manual/agentic testing for flows that automated tests do not cover
  • use browser automation/screenshots for UI changes
  • include evidence in PRs before asking reviewers to spend time
  • do not file unreviewed agent-generated PRs

For OpenClaw Harness design, this becomes a required output/evidence policy, not merely a coding preference.

8. Plans and repository knowledge as system of record

OpenAI's Harness Engineering article emphasizes a short AGENTS.md as a map and structured repository-local docs as the source of truth. The OpenAI Cookbook ExecPlan pattern adds self-contained living plans.

OpenClaw already has scoped docs, AGENTS.md, workspace bootstrap files, and runtime docs. The design should formalize:

  • short entrypoint guidance; deeper docs are indexed and discoverable
  • design/exec plans as first-class artifacts for large work
  • every plan contains purpose, context, acceptance, validation, decision log, and retrospective
  • docs are mechanically checked for freshness/cross-links where possible
  • stale rules should become lints/audits/hooks instead of accumulating in one huge prompt

9. Entropy and garbage collection

Repeated agent failure should produce a durable constraint:

  • if an agent repeats a mistake, decide whether the missing piece is a doc, tool, lint, audit check, hook, test, sandbox policy, or prompt rule
  • prefer mechanical checks for repeated classes of failure
  • recurring cleanup/gardening tasks update quality grades, remove obsolete docs, and tighten policy drift
  • quality/audit outputs should be machine-readable and link to concrete remediation

Output expected from this RFC task

No code changes. The desired output is a design document or decision record that answers:

  1. What is an OpenClaw Harness, in terms of existing OpenClaw runtime/tool/session abstractions?
  2. Which constraints are already implemented?
  3. Which gaps are already tracked by existing issues?
  4. Which new constraints should be proposed as follow-up implementation issues?
  5. Which constraints are hard policy, which are soft prompt guidance, and which are audit-only warnings?
  6. What are the default profiles for personal, shared, public, coding-worker, research-worker, cron, and ACP/native harness contexts?
  7. How should OpenClaw expose/explain the effective harness profile to users and maintainers?

Acceptance criteria for this issue

  • No production code, tests, docs, config, or repo files are changed as part of this task unless a later issue explicitly authorizes implementation.
  • The design maps each proposed constraint to an existing OpenClaw primitive or a clearly named gap.
  • The design distinguishes hard enforcement from prompt guidance and audit warnings.
  • The design includes at least three concrete profile examples and their allowed/denied tool groups.
  • The design includes a lethal-trifecta policy table for common OpenClaw surfaces: messaging, web, browser, filesystem, memory, sessions, cron, skills/plugins, and ACP/native harnesses.
  • The design lists follow-up implementation issues rather than bundling implementation into this RFC.

Related existing issues to consider

  • #58623 allowedTools/toolPolicy for sessions_spawn
  • #42981 per-session tool policies for cron jobs and spawned sessions
  • #7707 memory trust tagging by source
  • #71712 agent-facing scheduling API with non-forgeable provenance
  • #69291 systematic Agent Behavior Principles in the system prompt
  • #31206 token-efficient skill injection / progressive disclosure
  • #69300 harness compaction short-circuits memory flush and session bookkeeping
  • #71811 ACP harness sessions refuse OpenClaw internal context envelopes

extent analysis

TL;DR

Define an OpenClaw Harness as a policy/control layer that sits around an agent runtime, extracting constraint mechanisms to make OpenClaw agents more reliable, safer, and legible.

Guidance

  • Identify and document existing constraint mechanisms in OpenClaw, such as personal-assistant trust models and gateway auth.
  • Propose new constraints, including harness profiles, tool capability matrices, and lethal-trifecta enforcement, to address gaps in current implementation.
  • Develop a design document that maps proposed constraints to existing OpenClaw primitives or clearly named gaps, distinguishing between hard enforcement, prompt guidance, and audit warnings.
  • Create concrete profile examples, such as personal-full and shared-readonly, to demonstrate allowed and denied tool groups.
  • Establish a lethal-trifecta policy table for common OpenClaw surfaces, including messaging, web, browser, and filesystem.

Example

A harness profile, such as research-web, could be defined with specific constraints, including:

* web/search allowed
* no private memory or messaging tools
* outputs marked untrusted until reviewed

This profile would ensure that research workers have limited access to sensitive data and tools.

Notes

The design document should prioritize clarity and concision, focusing on the key components of the OpenClaw Harness and their relationships. It is essential to consider existing issues, such as #58623 and #42981, when developing the design.

Recommendation

Apply the proposed constraints and design principles to create a more robust and secure OpenClaw Harness, addressing the gaps in current implementation and enhancing the overall reliability and legibility of OpenClaw agents.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING