openclaw - ✅(Solved) Fix [Bug]: sandbox.mode: "off" still triggers Docker capability probe for cron/heartbeat/sub-agent sessions — causes failures when Docker daemon is unavailable [3 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#73586Fetched 2026-04-29 06:17:52
View on GitHub
Comments
1
Participants
2
Timeline
12
Reactions
0
Author
Assignees
Timeline (top)
referenced ×6cross-referenced ×3assigned ×1commented ×1

When agents.defaults.sandbox.mode is set to "off", OpenClaw still runs a Docker capability check (docker image inspect) at the start of every isolated session (cron jobs, heartbeats, sub-agents). If the Docker daemon is not running, this throws an error that propagates as a model call failure. openclaw sandbox explain confirms mode: off / sessionIsSandboxed: false with no per-agent overrides. The docs describe "off" as host execution with zero sandbox involvement — Docker should never be touched.

Error Message

gateway.err.log (repeated for every isolated session):

2026-04-28T07:00:02.327+01:00 [diagnostic] lane task error: lane=session:agent:main:cron:d6cf9b12-0795-4e87-972a-e3d67e777174 durationMs=2047 error="Error: Failed to inspect sandbox image: failed to connect to the docker API at unix:///Users/macmini/.orbstack/run/docker.sock; check if the path is correct and if the daemon is running: dial unix /Users/macmini/.orbstack/run/docker.sock: connect: no such file or directory"

2026-04-28T08:00:02.353+01:00 [diagnostic] lane task error: lane=session:agent:main:cron:9d2e918d-68f8-4b8d-8e43-0c064f6fc537 durationMs=2082 error="Error: Failed to inspect sandbox image: ..."

[Same error for every cron/heartbeat session across the day]

openclaw sandbox explain --json (both configured agents show identical output):

{ "sandbox": { "mode": "off", "scope": "agent", "workspaceAccess": "rw", "sessionIsSandboxed": false } }

Active sandbox config in openclaw.json:

{ "agents": { "defaults": { "sandbox": { "mode": "off", "workspaceAccess": "rw" } } } }

No sandbox config in agents.list[] for any agent.

Code trace (source verified against 2026.4.26 dist): • shouldSandboxSession() in runtime-status-C_nvYxR5.js: correctly returns false for mode: "off" • resolveSandboxSession() in sandbox-C77UjGet.js: has if (!runtime.sandboxed) return null — returns null • resolveSandboxContext() in sandbox-C77UjGet.js: has if (!resolved) return null — should exit early • Despite this, dockerImageExists() in docker-BhXHIHLp.js is still reached — implying a separate call path bypasses the resolveSandboxContext guard

Root Cause

When agents.defaults.sandbox.mode is set to "off", OpenClaw still runs a Docker capability check (docker image inspect) at the start of every isolated session (cron jobs, heartbeats, sub-agents). If the Docker daemon is not running, this throws an error that propagates as a model call failure. openclaw sandbox explain confirms mode: off / sessionIsSandboxed: false with no per-agent overrides. The docs describe "off" as host execution with zero sandbox involvement — Docker should never be touched.

Fix Action

Fix / Workaround

The only workaround currently is to keep OrbStack/Docker Desktop running at all times — which is unreasonable for a feature that's explicitly disabled. Switching web search from SearXNG (Docker-based) to Brave API eliminated the functional Docker dependency, but the probe still fires at the OpenClaw session level regardless.

PR fix notes

PR #73600: fix: resolve redundant docker probe when sandbox is off (#73586)

Description (problem / solution / changelog)

Summary

Describe the problem and fix in 2–5 bullets:

If this PR fixes a plugin beta-release blocker, title it fix(<plugin-id>): beta blocker - <summary> and link the matching Beta blocker: <plugin-name> - <summary> issue labeled beta-blocker. Contributors cannot label PRs, so the title is the PR-side signal for maintainers and automation.

  • Problem:
  • Why it matters:
  • What changed:
  • What did NOT change (scope boundary):

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #73586
  • Related #
  • This PR fixes a bug or regression

Root Cause (if applicable)

For bug fixes or regressions, explain why this happened, not just what changed. Otherwise write N/A. If the cause is unclear, write Unknown.

  • Root cause:
  • Missing detection / guardrail:
  • Contributing context (if known):

Regression Test Plan (if applicable)

For bug fixes or regressions, name the smallest reliable test coverage that should catch this. Otherwise write N/A.

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file:
  • Scenario the test should lock in:
  • Why this is the smallest reliable guardrail:
  • Existing test that already covers this (if any):
  • If no new test is added, why not:

User-visible / Behavior Changes

List user-visible changes (including defaults/config).
If none, write None.

Diagram (if applicable)

For UI changes or non-trivial logic flows, include a small ASCII diagram reviewers can scan quickly. Otherwise write N/A.

Before:
[user action] -> [old state]

After:
[user action] -> [new state] -> [result]

Security Impact (required)

  • New permissions/capabilities? (Yes/No)
  • Secrets/tokens handling changed? (Yes/No)
  • New/changed network calls? (Yes/No)
  • Command/tool execution surface changed? (Yes/No)
  • Data access scope changed? (Yes/No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS:
  • Runtime/container:
  • Model/provider:
  • Integration/channel (if any):
  • Relevant config (redacted):

Steps

Expected

Actual

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
  • Edge cases checked:
  • What you did not verify:

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? (Yes/No)
  • Config/env changes? (Yes/No)
  • Migration needed? (Yes/No)
  • If yes, exact upgrade steps:

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

  • Risk:
    • Mitigation:

Changed files


PR #73627: fix(sandbox): explicitly pass sandboxSessionKey to prevent Docker probes when mode is off (#73586)

Description (problem / solution / changelog)

Summary

  • Problem: When agents.defaults.sandbox.mode is set to "off", isolated sessions (cron jobs, heartbeats) still trigger Docker capability probes (socket connection attempts and image inspections).
  • Why it matters: This causes startup failures and significant log noise for users without a running Docker daemon, effectively creating a hard dependency on Docker even when the feature is explicitly disabled.
  • What changed: 1. Updated src/cron/isolated-agent/run-executor.ts to explicitly pass sandboxSessionKey during runEmbeddedPiAgent calls. 2. Added an explicit sandboxed status check at the top of resolveSandboxContext in src/agents/sandbox/context.ts to ensure no backend initialization occurs if sandboxing is disabled.
  • What did NOT change: The core logic for determining when a session should be sandboxed remains the same; we are simply ensuring the configuration is respected during the preflight phase of isolated runs.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #73586
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: In isolated cron/heartbeat runs, the sandboxSessionKey was not explicitly passed, causing the resolver to fall back to the runSessionId (a UUID). Because this UUID didn't match the mainSessionKey, the logic incorrectly flagged the session as "non-main" and enabled sandboxing despite the global "off" setting.
  • Missing detection / guardrail: resolveSandboxContext lacked a high-level short-circuit to prevent backend factory resolution before validating the sandboxed status.
  • Contributing context: This was specifically visible on macOS environments where Docker is often managed via on-demand services (OrbStack/Docker Desktop).

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
  • Target test or file: src/cron/isolated-agent/docker-probe.regression.test.ts
  • Scenario the test should lock in: Mocking a failed Docker socket connection and ensuring resolveSandboxContext returns null without throwing when mode is "off".
  • Why this is the smallest reliable guardrail: It directly targets the interaction between session key resolution and backend initialization.

User-visible / Behavior Changes

  • Users with sandbox.mode: "off" will no longer see "failed to connect to docker API" errors in their gateway logs.
  • Isolated tasks will no longer fail or experience latency spikes due to redundant Docker probes on startup.

Diagram (if applicable)

Before:
[Cron Start] -> [UUID Key] -> [shouldSandbox check] -> [Result: True (Key mismatch)] -> [Docker Probe] -> [CRASH]

After:
[Cron Start] -> [Agent Key] -> [shouldSandbox check] -> [Result: False (Mode: off)] -> [Early Exit] -> [Success]

## Security Impact (required)

- New permissions/capabilities? (`No`)
- Secrets/tokens handling changed? (`No`)
- New/changed network calls? (`No` - Actually removes unnecessary local socket calls)
- Command/tool execution surface changed? (`No`)
- Data access scope changed? (`No`)

## Repro + Verification

### Environment

- OS: macOS 26.3 (arm64) / Arch Linux
- Runtime/container: Node.js (host)
- Model/provider: Anthropic / OpenAI
- Relevant config: `agents.defaults.sandbox.mode: "off"`

### Steps

1. Set sandbox mode to "off" in `openclaw.json`.
2. Stop the local Docker daemon.
3. Trigger a cron-based isolated session.

### Expected
- Session executes on host immediately without checking Docker.

### Actual
- Session failed with `dial unix /var/run/docker.sock: connect: no such file or directory`.

## Evidence

- [x] Failing test/log before + passing after: `docker-probe.regression.test.ts` confirmed the leak was plugged.
- [x] Trace/log snippets: Verified `[gateway] ready` state is reached without socket errors.

## Human Verification (required)

- Verified scenarios: Local cron job execution with Docker service stopped.
- Edge cases checked: Verified sub-agent spawns also respect the explicit `sandboxSessionKey` handoff.
- What you did **not** verify: "all" mode (remains unchanged).

## Review Conversations

- [x] I replied to or resolved every bot review conversation I addressed in this PR.
- [x] I left unresolved only the conversations that still need reviewer or maintainer judgment.

## Compatibility / Migration

- Backward compatible? (`Yes`)
- Config/env changes? (`No`)
- Migration needed? (`No`)

## Risks and Mitigations

- Risk: Restructuring `resolveSandboxContext` might affect "non-main" auto-sandboxing.
- Mitigation: Added `sandbox.resolveSandboxContext.test.ts` to verify that standard sandboxing still triggers correctly when intended.

## Changed files

- `src/agents/sandbox/context.ts` (modified, +1/-2)
- `src/cron/isolated-agent/docker-probe.regression.test.ts` (added, +56/-0)
- `src/cron/isolated-agent/run-executor.ts` (modified, +1/-0)


---

# PR #73671: fix(sandbox): gracefully handle Docker daemon unavailability when sandbox mode is off

- Repository: openclaw/openclaw
- Author: kaseonedge
- State: open | merged: False
- Link: https://github.com/openclaw/openclaw/pull/73671

## Description (problem / solution / changelog)

## Bug #73586

When `sandbox.mode: "off"` is configured, session startup still triggers a Docker capability probe (`docker image inspect`) that fails when the Docker daemon is unavailable:

Error: Failed to inspect sandbox image: failed to connect to the docker API at unix:///Users/macmini/.orbstack/run/docker.sock: dial unix /Users/macmini/.orbstack/run/docker.sock: connect: no such file or directory


## Root Cause

`dockerImageExists()` in `src/agents/sandbox/docker.ts` threw when Docker daemon was unavailable, even though `resolveSandboxContext` correctly returns `null` for `sandbox.mode="off"`. The probe error bubbled up during session startup.

## Fix

Added `isDockerDaemonUnavailable()` helper to detect Docker connectivity errors (missing socket, connection refused, etc.) and return `false` gracefully instead of throwing. Applied to:

- `dockerImageExists()` — returns false when daemon is down
- `ensureDockerImage()` — skips pull when daemon is unavailable
- `ensureSandboxBrowserImage()` in browser.ts — same graceful handling
- `dockerImageExists()` in doctor-sandbox.ts — same graceful handling

## Files Changed

- `src/agents/sandbox/docker.ts` (+17 lines)
- `src/agents/sandbox/browser.ts` (+17 lines)
- `src/commands/doctor-sandbox.ts` (+10 lines)

## Testing

- Fix is defensive: catches Docker daemon error patterns and returns false
- Does not change behavior when Docker is available
- No behavior change for sandbox.mode="all" or sandbox.mode="non-main"
- Committed and pushed from fork for CI validation

## Changed files

- `src/agents/sandbox.ts` (modified, +1/-1)
- `src/agents/sandbox/browser.ts` (modified, +7/-0)
- `src/agents/sandbox/docker.test.ts` (modified, +20/-1)
- `src/agents/sandbox/docker.ts` (modified, +23/-6)
- `src/commands/doctor-sandbox.ts` (modified, +5/-0)

Code Example

gateway.err.log (repeated for every isolated session):

2026-04-28T07:00:02.327+01:00 [diagnostic] lane task error:
lane=session:agent:main:cron:d6cf9b12-0795-4e87-972a-e3d67e777174 durationMs=2047
error="Error: Failed to inspect sandbox image: failed to connect to the docker API
at unix:///Users/macmini/.orbstack/run/docker.sock; check if the path is correct
and if the daemon is running: dial unix /Users/macmini/.orbstack/run/docker.sock:
connect: no such file or directory"

2026-04-28T08:00:02.353+01:00 [diagnostic] lane task error:
lane=session:agent:main:cron:9d2e918d-68f8-4b8d-8e43-0c064f6fc537 durationMs=2082
error="Error: Failed to inspect sandbox image: ..."

[Same error for every cron/heartbeat session across the day]

openclaw sandbox explain --json (both configured agents show identical output):

{
  "sandbox": {
    "mode": "off",
    "scope": "agent",
    "workspaceAccess": "rw",
    "sessionIsSandboxed": false
  }
}

Active sandbox config in openclaw.json:

{
  "agents": {
    "defaults": {
      "sandbox": {
        "mode": "off",
        "workspaceAccess": "rw"
      }
    }
  }
}

No sandbox config in agents.list[] for any agent.

Code trace (source verified against 2026.4.26 dist):
shouldSandboxSession() in runtime-status-C_nvYxR5.js: correctly returns false for mode: "off"
resolveSandboxSession() in sandbox-C77UjGet.js: has if (!runtime.sandboxed) return null — returns null
resolveSandboxContext() in sandbox-C77UjGet.js: has if (!resolved) return null — should exit early
Despite this, dockerImageExists() in docker-BhXHIHLp.js is still reached — implying a separate call path bypasses the resolveSandboxContext guard
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

When agents.defaults.sandbox.mode is set to "off", OpenClaw still runs a Docker capability check (docker image inspect) at the start of every isolated session (cron jobs, heartbeats, sub-agents). If the Docker daemon is not running, this throws an error that propagates as a model call failure. openclaw sandbox explain confirms mode: off / sessionIsSandboxed: false with no per-agent overrides. The docs describe "off" as host execution with zero sandbox involvement — Docker should never be touched.

Steps to reproduce

  1. Set agents.defaults.sandbox.mode: "off" with no per-agent sandbox overrides in agents.list[]
  2. Stop Docker daemon (e.g. quit OrbStack / Docker Desktop)
  3. Trigger any cron job, heartbeat, or sub-agent session
  4. Check ~/.openclaw/logs/gateway.err.log

Expected behavior

sandbox.mode: "off" = zero Docker interaction. No probes, no capability checks, no connection attempts. All sessions run on host as documented.

Actual behavior

dockerImageExists() is called at session start even though resolveSandboxContext() should return null early for mode: "off". When Docker daemon is unavailable, throws:

Error: Failed to inspect sandbox image: failed to connect to the docker API at unix:///Users/macmini/.orbstack/run/docker.sock dial unix /Users/macmini/.orbstack/run/docker.sock: connect: no such file or directory

This propagates as a model call failure. Sessions either fail entirely or burn a fallback model slot before recovering.

OpenClaw version

2026.4.26 (be8c246)

Operating system

macOS 26.3 (arm64) — Mac mini, Apple Silicon

Install method

npm / pnpm (local install via openclaw CLI)

Model

anthropic/claude-sonnet-4-6 (primary), openai/gpt-5.5 (fallback)

Provider / routing chain

Anthropic → OpenAI fallback. Errors happen before model is reached — at session preflight.

Additional provider/model setup details

No per-agent model overrides relevant to this bug. Issue occurs regardless of which model is configured.

Logs, screenshots, and evidence

gateway.err.log (repeated for every isolated session):

2026-04-28T07:00:02.327+01:00 [diagnostic] lane task error:
lane=session:agent:main:cron:d6cf9b12-0795-4e87-972a-e3d67e777174 durationMs=2047
error="Error: Failed to inspect sandbox image: failed to connect to the docker API
at unix:///Users/macmini/.orbstack/run/docker.sock; check if the path is correct
and if the daemon is running: dial unix /Users/macmini/.orbstack/run/docker.sock:
connect: no such file or directory"

2026-04-28T08:00:02.353+01:00 [diagnostic] lane task error:
lane=session:agent:main:cron:9d2e918d-68f8-4b8d-8e43-0c064f6fc537 durationMs=2082
error="Error: Failed to inspect sandbox image: ..."

[Same error for every cron/heartbeat session across the day]

openclaw sandbox explain --json (both configured agents show identical output):

{
  "sandbox": {
    "mode": "off",
    "scope": "agent",
    "workspaceAccess": "rw",
    "sessionIsSandboxed": false
  }
}

Active sandbox config in openclaw.json:

{
  "agents": {
    "defaults": {
      "sandbox": {
        "mode": "off",
        "workspaceAccess": "rw"
      }
    }
  }
}

No sandbox config in agents.list[] for any agent.

Code trace (source verified against 2026.4.26 dist):
• shouldSandboxSession() in runtime-status-C_nvYxR5.js: correctly returns false for mode: "off"
• resolveSandboxSession() in sandbox-C77UjGet.js: has if (!runtime.sandboxed) return null — returns null
• resolveSandboxContext() in sandbox-C77UjGet.js: has if (!resolved) return null — should exit early
• Despite this, dockerImageExists() in docker-BhXHIHLp.js is still reached — implying a separate call path bypasses the resolveSandboxContext guard

Impact and severity

Medium-High. Any install without a running Docker daemon (standard macOS developer setup without Docker Desktop/OrbStack) gets error noise on every cron/isolated session. Some sessions fail entirely (e.g. scheduled morning briefing — 3 consecutive errors). Others succeed only after wasting a fallback model call, adding 7–10s latency. On a machine where Docker is not expected to be running, this creates a hard-to-diagnose dependency that doesn't appear in any config documentation.

Additional information

The only workaround currently is to keep OrbStack/Docker Desktop running at all times — which is unreasonable for a feature that's explicitly disabled. Switching web search from SearXNG (Docker-based) to Brave API eliminated the functional Docker dependency, but the probe still fires at the OpenClaw session level regardless.

extent analysis

TL;DR

The issue can be fixed by ensuring that the resolveSandboxContext function correctly returns null when sandbox.mode is set to "off", preventing the unnecessary Docker capability check.

Guidance

  • Review the resolveSandboxContext function in sandbox-C77UjGet.js to ensure it correctly returns null when sandbox.mode is "off", as implied by the code trace.
  • Verify that the shouldSandboxSession function in runtime-status-C_nvYxR5.js returns false for sandbox.mode "off", as this is a prerequisite for the correct behavior.
  • Investigate the separate call path that bypasses the resolveSandboxContext guard, causing dockerImageExists to be called despite sandbox.mode being "off".
  • Consider adding additional logging or debugging statements to understand the flow of execution and identify the root cause of the issue.

Example

No code snippet is provided as the issue is more related to the logic and flow of the code rather than a specific code error.

Notes

The provided code trace and logs suggest that the issue is related to the resolveSandboxContext function not correctly returning null when sandbox.mode is "off". However, without the full codebase, it's difficult to provide a definitive solution.

Recommendation

Apply a workaround by modifying the resolveSandboxContext function to correctly return null when sandbox.mode is "off", or investigate and fix the separate call path that bypasses the guard, as this will prevent the unnecessary Docker capability check and resolve the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

sandbox.mode: "off" = zero Docker interaction. No probes, no capability checks, no connection attempts. All sessions run on host as documented.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: sandbox.mode: "off" still triggers Docker capability probe for cron/heartbeat/sub-agent sessions — causes failures when Docker daemon is unavailable [3 pull requests, 1 comments, 2 participants]