openclaw - 💡(How to fix) Fix agents.defaults.sandbox.mode is binary (all/none) — full bot outage when Docker briefly stops, including chat-only turns [1 comments, 2 participants]

jpazvd · 2026-05-10T14:26:13Z

[openclaw] agents.defaults.sandbox.mode: "all" is the documented production setting per ADR-0003 https://github.com/jpazvd/jpazvd.lab-config/blob/main/docs/adr… `agents.defaults.sandbox.mode: "all"` is the documented production setting per [ADR-0003](https://github.com/jpazvd/jpazvd.lab-config/blob/main/docs/adr/0003-tools-exec-allowlist-host-sandbox.md)-equivalent reasoning — the gateway routes every `exec` through Docker so allowlisted binaries can be enforced safely. The schema, however, only offers `"all"` (require Docker for every turn) or `"none"` (allow exec on host). Setting `"all"` makes Docker a hard dependency for **every turn**, including pure conversational turns that never invoke `exec`. When Docker Desktop briefly stops — sleep wake, mid-day restart, OOM kill, version upgrade — every bot turn fails until Docker comes back, even chat-only ones. The user-perceived effect is "the bot is dead" for whatever recovery window Docker takes. ## Fix / Workaround - The gateway routes turns through Docker **only when the turn invokes a sandbox-bound tool** (`exec`, `apply_patch`, anything in the sandbox allowlist). - Pure-chat turns, MCP calls that don't go through `exec`, FS reads/writes inside `tools.fs.workspaceOnly`, and replies that don't trigger any tool route entirely on host — Docker is not pre-flight checked. - If a turn requests `exec` while Docker is down, the gateway returns a clear "sandbox temporarily unavailable" error to the model (which can decide to retry or surface to the user). - **Keep `"all"` and rely on local mitigation** (Docker Desktop autostart, gateway wrapper polling `docker info`) — what we have today. The wrapper has a 60–180 s recovery window; turns sent during that window still fail. - **Set `"none"` everywhere** — defeats the entire point of [ADR-0003](https://github.com/jpazvd/jpazvd.lab-config/blob/main/docs/adr/0003-tools-exec-allowlist-host-sandbox.md). Exec on host bypasses the allowlist enforcement. - **Per-agent override** — `socialcraft` could be `"none"` since it never execs, but `main` and `datalab` still hit the issue. ## Workaround in place # Upstream issue draft — OpenClaw **Repo:** `https://github.com/openclaw/openclaw/issues/new` **Title:** `agents.defaults.sandbox.mode` is binary (`all`/`none`) — full bot outage when Docker briefly stops, including chat-only turns ## Summary `agents.defaults.sandbox.mode: "all"` is the documented production setting per [ADR-0003](https://github.com/jpazvd/jpazvd.lab-config/blob/main/docs/adr/0003-tools-exec-allowlist-host-sandbox.md)-equivalent reasoning — the gateway routes every `exec` through Docker so allowlisted binaries can be enforced safely. The schema, however, only offers `"all"` (require Docker for every turn) or `"none"` (allow exec on host). Setting `"all"` makes Docker a hard dependency for **every turn**, including pure conversational turns that never invoke `exec`. When Docker Desktop briefly stops — sleep wake, mid-day restart, OOM kill, version upgrade — every bot turn fails until Docker comes back, even chat-only ones. The user-perceived effect is "the bot is dead" for whatever recovery window Docker takes. ## Symptom ``` 2026-05-09T14:15:22-04:00 [agent/embedded] error: sandbox unavailable: cannot reach docker.sock [bot] turn rejected; no reply sent for chat=8667951543 ``` The user sent *"hi"* — a chat-only turn that needs zero exec. Reply was rejected because the gateway pre-flight checks Docker before routing to the model. Empirically observed on `openclaw 2026.5.2 (8b2a6e5)` — Mac Mini M4 Pro, macOS 25.4.0, Docker Desktop 4.x. Triggers ~weekly when Docker Desktop's auto-update kicks in. ## Proposed fix Add a third value: `agents.defaults.sandbox.mode: "needed"`. Semantics: - The gateway routes turns through Docker **only when the turn invokes a sandbox-bound tool** (`exec`, `apply_patch`, anything in the sandbox allowlist). - Pure-chat turns, MCP calls that don't go through `exec`, FS reads/writes inside `tools.fs.workspaceOnly`, and replies that don't trigger any tool route entirely on host — Docker is not pre-flight checked. - If a turn requests `exec` while Docker is down, the gateway returns a clear "sandbox temporarily unavailable" error to the model (which can decide to retry or surface to the user). This matches the actual security threat model: the host needs Docker for `exec` isolation, but conversational turns don't have an exec attack surface. ## Alternatives considered - **Keep `"all"` and rely on local mitigation** (Docker Desktop autostart, gateway wrapper polling `docker info`) — what we have today. The wrapper has a 60–180 s recovery window; turns sent during that window still fail. - **Set `"none"` everywhere** — defeats the entire point of [ADR-0003](https://github.com/jpazvd/jpazvd.lab-config/blob/main/docs/adr/0003-tools-exec-allowlist-host-sandbox.md). Exec on host bypasses the allowlist enforcement. - **Per-agent override** — `socialcraft` could be `"none"` since it never execs,

openclaw2026-05-10 14:26:13

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#80296•Fetched 2026-05-11 03:16:39

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jpazvd

Participants

clawsweeper[bot]

jpazvd

Timeline (top)

commented ×1

agents.defaults.sandbox.mode: "all" is the documented production setting per ADR-0003-equivalent reasoning — the gateway routes every exec through Docker so allowlisted binaries can be enforced safely. The schema, however, only offers "all" (require Docker for every turn) or "none" (allow exec on host). Setting "all" makes Docker a hard dependency for every turn, including pure conversational turns that never invoke exec.

When Docker Desktop briefly stops — sleep wake, mid-day restart, OOM kill, version upgrade — every bot turn fails until Docker comes back, even chat-only ones. The user-perceived effect is "the bot is dead" for whatever recovery window Docker takes.

Error Message

2026-05-09T14:15:22-04:00 [agent/embedded] error: sandbox unavailable: cannot reach docker.sock [bot] turn rejected; no reply sent for chat=8667951543

Root Cause

The user sent "hi" — a chat-only turn that needs zero exec. Reply was rejected because the gateway pre-flight checks Docker before routing to the model.

Fix Action

Fix / Workaround

The gateway routes turns through Docker only when the turn invokes a sandbox-bound tool (exec, apply_patch, anything in the sandbox allowlist).
Pure-chat turns, MCP calls that don't go through exec, FS reads/writes inside tools.fs.workspaceOnly, and replies that don't trigger any tool route entirely on host — Docker is not pre-flight checked.
If a turn requests exec while Docker is down, the gateway returns a clear "sandbox temporarily unavailable" error to the model (which can decide to retry or surface to the user).
Keep "all" and rely on local mitigation (Docker Desktop autostart, gateway wrapper polling docker info) — what we have today. The wrapper has a 60–180 s recovery window; turns sent during that window still fail.
Set "none" everywhere — defeats the entire point of ADR-0003. Exec on host bypasses the allowlist enforcement.
Per-agent override — socialcraft could be "none" since it never execs, but main and datalab still hit the issue.

Workaround in place

Code Example

2026-05-09T14:15:22-04:00 [agent/embedded] error: sandbox unavailable: cannot reach docker.sock
[bot] turn rejected; no reply sent for chat=8667951543

RAW_BUFFERClick to expand / collapse

Upstream issue draft — OpenClaw

Repo: https://github.com/openclaw/openclaw/issues/new Title: agents.defaults.sandbox.mode is binary (all/none) — full bot outage when Docker briefly stops, including chat-only turns

Summary

Symptom

2026-05-09T14:15:22-04:00 [agent/embedded] error: sandbox unavailable: cannot reach docker.sock
[bot] turn rejected; no reply sent for chat=8667951543

The user sent "hi" — a chat-only turn that needs zero exec. Reply was rejected because the gateway pre-flight checks Docker before routing to the model.

Empirically observed on openclaw 2026.5.2 (8b2a6e5) — Mac Mini M4 Pro, macOS 25.4.0, Docker Desktop 4.x. Triggers ~weekly when Docker Desktop's auto-update kicks in.

Proposed fix

Add a third value: agents.defaults.sandbox.mode: "needed".

Semantics:

The gateway routes turns through Docker only when the turn invokes a sandbox-bound tool (exec, apply_patch, anything in the sandbox allowlist).
Pure-chat turns, MCP calls that don't go through exec, FS reads/writes inside tools.fs.workspaceOnly, and replies that don't trigger any tool route entirely on host — Docker is not pre-flight checked.
If a turn requests exec while Docker is down, the gateway returns a clear "sandbox temporarily unavailable" error to the model (which can decide to retry or surface to the user).

This matches the actual security threat model: the host needs Docker for exec isolation, but conversational turns don't have an exec attack surface.

Alternatives considered

Keep "all" and rely on local mitigation (Docker Desktop autostart, gateway wrapper polling docker info) — what we have today. The wrapper has a 60–180 s recovery window; turns sent during that window still fail.
Set "none" everywhere — defeats the entire point of ADR-0003. Exec on host bypasses the allowlist enforcement.
Per-agent override — socialcraft could be "none" since it never execs, but main and datalab still hit the issue.

"needed" is the smallest schema change that gets us out of the binary trap.

Reproduction

Set agents.defaults.sandbox.mode: "all" (production default).
osascript -e 'quit app "Docker Desktop"' (or just close Docker).
Send any chat message via Telegram, including one that never invokes a tool.
Bot turn fails with sandbox-unavailable; no reply received.
Restart Docker; turns recover.

Expected with "needed": step 3 returns a normal model reply because the turn doesn't invoke exec.

Workaround in place

Local: Docker Desktop's autostart-on-boot is enabled, plus bin/openclaw-gateway.sh polls docker info every 5 s and only kickstarts the gateway after Docker reports healthy. Reduces but doesn't eliminate the race window.

Environment

openclaw 2026.5.2 (8b2a6e5)
macOS 25.4.0 / arm64 / Mac Mini M4 Pro 64 GB
Docker Desktop 4.x

Plan #18 Tier 0 #6 in jpazvd/jpazvd.lab-config
Pairs with #80150 (tool-call regression), #80152 (callbackQuery timeout), #80153 (watchdog floor) — together these four issues account for the bulk of locally-tracked Tier 0 work.
This is a schema change, so it's smaller scope than the other three (no resolver work, no watchdog work, no Telegram bot pattern work).

What I'd expect

agents.defaults.sandbox.mode: "needed" lands as a third schema-validated value.
Default for new installs stays "all" (no behavior change).
Documentation explicitly recommends "needed" for deployments where chat is a primary surface and Docker isn't 100% uptime.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#search optimization #API routing #API middleware #SSR setup #ISR setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix agents.defaults.sandbox.mode is binary (all/none) — full bot outage when Docker briefly stops, including chat-only turns [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workaround in place

Code Example

Upstream issue draft — OpenClaw

Summary

Symptom

Proposed fix

Alternatives considered

Reproduction

Workaround in place

Environment

Related

What I'd expect

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix agents.defaults.sandbox.mode is binary (all/none) — full bot outage when Docker briefly stops, including chat-only turns [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workaround in place

Code Example

Upstream issue draft — OpenClaw

Summary

Symptom

Proposed fix

Alternatives considered

Reproduction

Workaround in place

Environment

Related

What I'd expect

Still need to ship something?

RELATED_DISCOVERY

TRENDING