openclaw - 💡(How to fix) Fix [Bug]: Sandbox container creation silently fails - no error logged, agent zombies [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#57992Fetched 2026-04-08 01:55:12
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

When spawning a sandboxed subagent (sessions_spawn with a named agent that inherits agents.defaults.sandbox.mode: "non-main"), the spawn is accepted and a workspace directory is created, but the Docker container creation step silently fails. No error is logged anywhere (gateway logs, session metadata, or container registry). The agent session shows as "running" with 0 tokens consumed until it times out or is killed.

Error Message

When spawning a sandboxed subagent (sessions_spawn with a named agent that inherits agents.defaults.sandbox.mode: "non-main"), the spawn is accepted and a workspace directory is created, but the Docker container creation step silently fails. No error is logged anywhere (gateway logs, session metadata, or container registry). The agent session shows as "running" with 0 tokens consumed until it times out or is killed. If Docker container creation fails, the error should be: 3. The session should transition to an error state, not remain "running" 4. Ideally: retry once, then fail with actionable error

  • No error logged anywhere

Root Cause

When spawning a sandboxed subagent (sessions_spawn with a named agent that inherits agents.defaults.sandbox.mode: "non-main"), the spawn is accepted and a workspace directory is created, but the Docker container creation step silently fails. No error is logged anywhere (gateway logs, session metadata, or container registry). The agent session shows as "running" with 0 tokens consumed until it times out or is killed.

Fix Action

Fix / Workaround

Medium - causes wasted time and API quota confusion. Workaround: always set runTimeoutSeconds to cap damage.

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Summary

When spawning a sandboxed subagent (sessions_spawn with a named agent that inherits agents.defaults.sandbox.mode: "non-main"), the spawn is accepted and a workspace directory is created, but the Docker container creation step silently fails. No error is logged anywhere (gateway logs, session metadata, or container registry). The agent session shows as "running" with 0 tokens consumed until it times out or is killed.

Steps to reproduce

  1. Configure a named agent in agents.list[] that inherits Docker sandbox from agents.defaults.sandbox.mode: "non-main"
  2. Spawn it via sessions_spawn with thread: true
  3. Under certain conditions (possibly transient Docker API connectivity), the workspace is provisioned but no container is created
  4. Agent sits in phantom "running" state indefinitely (or until runTimeoutSeconds)

Expected behavior

If Docker container creation fails, the error should be:

  1. Logged in gateway logs
  2. Surfaced in session metadata / subagents list output
  3. The session should transition to an error state, not remain "running"
  4. Ideally: retry once, then fail with actionable error

Actual behavior

  • Workspace directory created (AGENTS.md, SOUL.md etc copied)
  • No Docker container created (not in docker ps -a, not in containers.json, no docker events)
  • No error logged anywhere
  • Session shows "running" with 0 tokens in/out
  • Agent zombies until timeout or manual kill

Evidence

Two spawn attempts on 2026-03-30 (9:14 AM and 10:07 AM PDT) both exhibited this behavior. Docker Desktop was running (confirmed via ps - started 7:40 PM previous night). Gateway was running continuously. A third spawn at 8:56 PM the night before worked perfectly. Later spawns at 2:01 PM also worked perfectly. Failure was transient.

OpenClaw version

2026.3.28

Operating system

macOS 15 (Darwin 25.4.0, arm64, Apple M4 Ultra)

Install method

npm global install

Model

anthropic/claude-opus-4-6 (spawning agent), anthropic/claude-sonnet-4-6 (target agent)

Impact and severity

Medium - causes wasted time and API quota confusion. Workaround: always set runTimeoutSeconds to cap damage.

extent analysis

Fix Plan

To address the issue of silent Docker container creation failure, we'll implement the following steps:

  • Enhance error logging for Docker container creation
  • Introduce retry mechanism for container creation
  • Update session state to reflect errors

Code Changes

import logging
import docker

# Enhance error logging
logging.basicConfig(level=logging.ERROR)

def create_docker_container(session):
    try:
        # Create Docker container
        container = docker.create_container(session['image'])
        logging.info(f"Container created: {container.id}")
    except docker.errors.APIError as e:
        logging.error(f"Error creating container: {e}")
        # Retry container creation
        retry_create_docker_container(session)
    except Exception as e:
        logging.error(f"Unexpected error: {e}")

def retry_create_docker_container(session):
    max_retries = 1
    retry_count = 0
    while retry_count < max_retries:
        try:
            container = docker.create_container(session['image'])
            logging.info(f"Container created: {container.id}")
            break
        except docker.errors.APIError as e:
            logging.error(f"Error creating container (retry {retry_count+1}/{max_retries}): {e}")
            retry_count += 1
    if retry_count == max_retries:
        # Update session state to reflect error
        session['state'] = 'error'
        logging.error(f"Failed to create container after {max_retries} retries")

# Update session state to reflect errors
def update_session_state(session):
    if session['state'] == 'error':
        logging.error(f"Session {session['id']} failed")
        # Perform any necessary cleanup or notification

Verification

To verify the fix, spawn a new session with the updated code and simulate a Docker container creation failure. Check the logs for error messages and verify that the session state is updated to reflect the error.

Extra Tips

  • Ensure that the Docker API is properly configured and accessible.
  • Consider implementing additional logging and monitoring to detect and respond to container creation failures.
  • Review the runTimeoutSeconds setting to ensure it is properly configured to prevent zombie sessions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

If Docker container creation fails, the error should be:

  1. Logged in gateway logs
  2. Surfaced in session metadata / subagents list output
  3. The session should transition to an error state, not remain "running"
  4. Ideally: retry once, then fail with actionable error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Sandbox container creation silently fails - no error logged, agent zombies [1 participants]