openclaw - 💡(How to fix) Fix agents.defaults.sandbox.mode is binary (all/none) — full bot outage when Docker briefly stops, including chat-only turns [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#80296Fetched 2026-05-11 03:16:39
View on GitHub
Comments
1
Participants
2
Timeline
1
Reactions
2
Author
Timeline (top)
commented ×1

agents.defaults.sandbox.mode: "all" is the documented production setting per ADR-0003-equivalent reasoning — the gateway routes every exec through Docker so allowlisted binaries can be enforced safely. The schema, however, only offers "all" (require Docker for every turn) or "none" (allow exec on host). Setting "all" makes Docker a hard dependency for every turn, including pure conversational turns that never invoke exec.

When Docker Desktop briefly stops — sleep wake, mid-day restart, OOM kill, version upgrade — every bot turn fails until Docker comes back, even chat-only ones. The user-perceived effect is "the bot is dead" for whatever recovery window Docker takes.

Error Message

2026-05-09T14:15:22-04:00 [agent/embedded] error: sandbox unavailable: cannot reach docker.sock [bot] turn rejected; no reply sent for chat=8667951543

Root Cause

The user sent "hi" — a chat-only turn that needs zero exec. Reply was rejected because the gateway pre-flight checks Docker before routing to the model.

Fix Action

Fix / Workaround

  • The gateway routes turns through Docker only when the turn invokes a sandbox-bound tool (exec, apply_patch, anything in the sandbox allowlist).

  • Pure-chat turns, MCP calls that don't go through exec, FS reads/writes inside tools.fs.workspaceOnly, and replies that don't trigger any tool route entirely on host — Docker is not pre-flight checked.

  • If a turn requests exec while Docker is down, the gateway returns a clear "sandbox temporarily unavailable" error to the model (which can decide to retry or surface to the user).

  • Keep "all" and rely on local mitigation (Docker Desktop autostart, gateway wrapper polling docker info) — what we have today. The wrapper has a 60–180 s recovery window; turns sent during that window still fail.

  • Set "none" everywhere — defeats the entire point of ADR-0003. Exec on host bypasses the allowlist enforcement.

  • Per-agent overridesocialcraft could be "none" since it never execs, but main and datalab still hit the issue.

Workaround in place

Code Example

2026-05-09T14:15:22-04:00 [agent/embedded] error: sandbox unavailable: cannot reach docker.sock
[bot] turn rejected; no reply sent for chat=8667951543
RAW_BUFFERClick to expand / collapse

Upstream issue draft — OpenClaw

Repo: https://github.com/openclaw/openclaw/issues/new Title: agents.defaults.sandbox.mode is binary (all/none) — full bot outage when Docker briefly stops, including chat-only turns

Summary

agents.defaults.sandbox.mode: "all" is the documented production setting per ADR-0003-equivalent reasoning — the gateway routes every exec through Docker so allowlisted binaries can be enforced safely. The schema, however, only offers "all" (require Docker for every turn) or "none" (allow exec on host). Setting "all" makes Docker a hard dependency for every turn, including pure conversational turns that never invoke exec.

When Docker Desktop briefly stops — sleep wake, mid-day restart, OOM kill, version upgrade — every bot turn fails until Docker comes back, even chat-only ones. The user-perceived effect is "the bot is dead" for whatever recovery window Docker takes.

Symptom

2026-05-09T14:15:22-04:00 [agent/embedded] error: sandbox unavailable: cannot reach docker.sock
[bot] turn rejected; no reply sent for chat=8667951543

The user sent "hi" — a chat-only turn that needs zero exec. Reply was rejected because the gateway pre-flight checks Docker before routing to the model.

Empirically observed on openclaw 2026.5.2 (8b2a6e5) — Mac Mini M4 Pro, macOS 25.4.0, Docker Desktop 4.x. Triggers ~weekly when Docker Desktop's auto-update kicks in.

Proposed fix

Add a third value: agents.defaults.sandbox.mode: "needed".

Semantics:

  • The gateway routes turns through Docker only when the turn invokes a sandbox-bound tool (exec, apply_patch, anything in the sandbox allowlist).
  • Pure-chat turns, MCP calls that don't go through exec, FS reads/writes inside tools.fs.workspaceOnly, and replies that don't trigger any tool route entirely on host — Docker is not pre-flight checked.
  • If a turn requests exec while Docker is down, the gateway returns a clear "sandbox temporarily unavailable" error to the model (which can decide to retry or surface to the user).

This matches the actual security threat model: the host needs Docker for exec isolation, but conversational turns don't have an exec attack surface.

Alternatives considered

  • Keep "all" and rely on local mitigation (Docker Desktop autostart, gateway wrapper polling docker info) — what we have today. The wrapper has a 60–180 s recovery window; turns sent during that window still fail.
  • Set "none" everywhere — defeats the entire point of ADR-0003. Exec on host bypasses the allowlist enforcement.
  • Per-agent overridesocialcraft could be "none" since it never execs, but main and datalab still hit the issue.

"needed" is the smallest schema change that gets us out of the binary trap.

Reproduction

  1. Set agents.defaults.sandbox.mode: "all" (production default).
  2. osascript -e 'quit app "Docker Desktop"' (or just close Docker).
  3. Send any chat message via Telegram, including one that never invokes a tool.
  4. Bot turn fails with sandbox-unavailable; no reply received.
  5. Restart Docker; turns recover.

Expected with "needed": step 3 returns a normal model reply because the turn doesn't invoke exec.

Workaround in place

Local: Docker Desktop's autostart-on-boot is enabled, plus bin/openclaw-gateway.sh polls docker info every 5 s and only kickstarts the gateway after Docker reports healthy. Reduces but doesn't eliminate the race window.

Environment

  • openclaw 2026.5.2 (8b2a6e5)
  • macOS 25.4.0 / arm64 / Mac Mini M4 Pro 64 GB
  • Docker Desktop 4.x

Related

  • Plan #18 Tier 0 #6 in jpazvd/jpazvd.lab-config
  • Pairs with #80150 (tool-call regression), #80152 (callbackQuery timeout), #80153 (watchdog floor) — together these four issues account for the bulk of locally-tracked Tier 0 work.
  • This is a schema change, so it's smaller scope than the other three (no resolver work, no watchdog work, no Telegram bot pattern work).

What I'd expect

  1. agents.defaults.sandbox.mode: "needed" lands as a third schema-validated value.
  2. Default for new installs stays "all" (no behavior change).
  3. Documentation explicitly recommends "needed" for deployments where chat is a primary surface and Docker isn't 100% uptime.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix agents.defaults.sandbox.mode is binary (all/none) — full bot outage when Docker briefly stops, including chat-only turns [1 comments, 2 participants]