crewai - ✅(Solved) Fix Feature: Add QEMU microVM execution strategy (exec-sandbox) as Docker alternative for CodeInterpreterTool [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
crewAIInc/crewAI#4702Fetched 2026-04-08 00:40:33
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Author
Timeline (top)
closed ×1commented ×1cross-referenced ×1mentioned ×1

Error Message

return f"Error (exit code {result.exit_code}):\n{result.stderr}"

Fix Action

Fix / Workaround

Internally, CodeInterpreterTool would gain a run_code_in_microvm() method parallel to the existing run_code_in_docker() and run_code_in_restricted_sandbox(), dispatched from _run().

PR fix notes

PR #4736: feat: add microvm code execution mode via exec-sandbox (#4702)

Description (problem / solution / changelog)

Summary

This PR implements issue #4702.

  • Scope: Feature: Add QEMU microVM execution strategy (exec-sandbox) as Docker alternative for CodeInterpreterTool
  • Source branch: yuweuii:codex/issue-4702
  • Commit: 1b5a5022

Linked Issue

Closes #4702

<!-- CURSOR_SUMMARY -->

[!NOTE] Medium Risk Introduces a new code-execution backend and changes when Docker validation runs, which could affect safety/isolation and runtime behavior in production environments.

Overview Adds a new code_execution_mode="microvm" option for agents and execution_mode="microvm" for CodeInterpreterTool, dispatching Python execution through exec-sandbox (QEMU microVM) instead of Docker or host execution.

Updates agent/tool wiring so Agent.get_code_execution_tools() configures the interpreter with execution_mode vs unsafe_mode, and Docker installation validation now runs only for code_execution_mode="safe". Includes new microVM execution implementation (async scheduler wrapper), tests for the new dispatch/path, and documentation updates describing the new mode and installation requirements.

<sup>Written by Cursor Bugbot for commit d520f161512c50d1052f84314658832bffda0e5d. This will update automatically on new commits. Configure here.</sup>

<!-- /CURSOR_SUMMARY -->

Changed files

  • docs/en/concepts/agents.mdx (modified, +5/-4)
  • docs/en/tools/ai-ml/codeinterpretertool.mdx (modified, +34/-12)
  • lib/crewai-tools/src/crewai_tools/tools/code_interpreter_tool/code_interpreter_tool.py (modified, +70/-1)
  • lib/crewai-tools/tests/tools/test_code_interpreter_tool.py (modified, +63/-1)
  • lib/crewai/src/crewai/agent/core.py (modified, +15/-5)
  • lib/crewai/src/crewai/cli/templates/AGENTS.md (modified, +1/-1)
  • lib/crewai/src/crewai/project/crew_base.py (modified, +1/-1)
  • lib/crewai/tests/test_crew.py (modified, +53/-0)

Code Example

from crewai import Agent, Task, Crew
from crewai.tools import BaseTool
from pydantic import BaseModel, Field
from exec_sandbox import Scheduler

class ExecSandboxInput(BaseModel):
    code: str = Field(..., description="Python 3 code to execute")
    packages: list[str] = Field(
        default_factory=list,
        description="pip packages to install before execution (e.g. ['pandas==2.2.0'])",
    )

class ExecSandboxTool(BaseTool):
    name: str = "Secure Code Interpreter"
    description: str = (
        "Executes Python 3 code in a hardware-isolated VM sandbox. "
        "Each execution gets a fresh VM that is destroyed after. "
        "Use for computations, data analysis, or any code that needs "
        "to run securely. Always end with a print() statement for output."
    )
    args_schema: type[BaseModel] = ExecSandboxInput

    def _run(self, code: str, packages: list[str] | None = None) -> str:
        import asyncio

        async def _execute():
            async with Scheduler() as scheduler:
                result = await scheduler.run(
                    code=code,
                    language="python",
                    packages=packages or [],
                    timeout_seconds=60,
                )
                if result.exit_code != 0:
                    return f"Error (exit code {result.exit_code}):\n{result.stderr}"
                return result.stdout

        return asyncio.run(_execute())

# Usage with a CrewAI agent
agent = Agent(
    role="Data Analyst",
    goal="Analyze data and produce insights",
    backstory="Expert data analyst with strong Python skills",
    tools=[ExecSandboxTool()],
)

---

agent = Agent(
    role="Data Analyst",
    goal="Analyze data and produce insights",
    backstory="Expert data analyst",
    allow_code_execution=True,
    code_execution_mode="microvm",  # New mode: QEMU microVM via exec-sandbox
)
RAW_BUFFERClick to expand / collapse

Problem

CrewAI's CodeInterpreterTool exposes two execution modes, each with different code paths:

  1. Safe mode (default) -- tries Docker container execution (recommended), automatically falls back to a restricted sandbox when Docker is unavailable. The sandbox is described as "very limited" with strict restrictions on many modules and built-in functions.
  2. Unsafe mode -- executes directly on the host, explicitly not recommended for production

This leaves a real gap. Users who cannot or prefer not to run Docker (CI environments, macOS enterprise license constraints, Docker-in-Docker headaches per #3028, or containerized deployments) are stuck choosing between a severely restricted sandbox and running untrusted LLM-generated code directly on their host machine. The community forum thread and issue #1983 show this is a recurring pain point.

Proposal

Add exec-sandbox (pip install exec-sandbox) as a 4th execution strategy -- hardware-isolated QEMU microVMs that provide stronger isolation than Docker containers without requiring a Docker daemon.

What exec-sandbox provides:

Docker (current)exec-sandbox (proposed)
IsolationContainer (shared kernel)Hardware VM (KVM/HVF, own kernel)
Daemon requiredYes (Docker Desktop)No (just QEMU binary)
Docker Desktop license costPaid for orgs with 250+ employees or $10M+ revenueFree (Apache-2.0 + QEMU GPL)
Warm start latencyContainer startup (~1s)1-2ms (pre-booted VM pool)
Cold start latencyImage pull + boot~100ms (L1 memory snapshot)
LanguagesPython (current impl)Python, JavaScript, RAW
State leakagePossible (shared layers)None (fresh VM per execution, destroyed after)
Docker-in-DockerProblematic (#3028)N/A (no daemon)
Network controlManual configDisabled by default, domain allowlisting
File I/OMounts host CWDExplicit upload/download (no host filesystem exposure)

How integration could work

Option A: As a custom CrewAI Tool (works today, no core changes needed)

from crewai import Agent, Task, Crew
from crewai.tools import BaseTool
from pydantic import BaseModel, Field
from exec_sandbox import Scheduler

class ExecSandboxInput(BaseModel):
    code: str = Field(..., description="Python 3 code to execute")
    packages: list[str] = Field(
        default_factory=list,
        description="pip packages to install before execution (e.g. ['pandas==2.2.0'])",
    )

class ExecSandboxTool(BaseTool):
    name: str = "Secure Code Interpreter"
    description: str = (
        "Executes Python 3 code in a hardware-isolated VM sandbox. "
        "Each execution gets a fresh VM that is destroyed after. "
        "Use for computations, data analysis, or any code that needs "
        "to run securely. Always end with a print() statement for output."
    )
    args_schema: type[BaseModel] = ExecSandboxInput

    def _run(self, code: str, packages: list[str] | None = None) -> str:
        import asyncio

        async def _execute():
            async with Scheduler() as scheduler:
                result = await scheduler.run(
                    code=code,
                    language="python",
                    packages=packages or [],
                    timeout_seconds=60,
                )
                if result.exit_code != 0:
                    return f"Error (exit code {result.exit_code}):\n{result.stderr}"
                return result.stdout

        return asyncio.run(_execute())

# Usage with a CrewAI agent
agent = Agent(
    role="Data Analyst",
    goal="Analyze data and produce insights",
    backstory="Expert data analyst with strong Python skills",
    tools=[ExecSandboxTool()],
)

Option B: As a native CodeInterpreterTool strategy (requires core changes)

This would add "microvm" as a new code_execution_mode alongside "safe" and "unsafe":

agent = Agent(
    role="Data Analyst",
    goal="Analyze data and produce insights",
    backstory="Expert data analyst",
    allow_code_execution=True,
    code_execution_mode="microvm",  # New mode: QEMU microVM via exec-sandbox
)

Internally, CodeInterpreterTool would gain a run_code_in_microvm() method parallel to the existing run_code_in_docker() and run_code_in_restricted_sandbox(), dispatched from _run().

Why not just use Docker?

Docker works well for many users, and this proposal does not replace it. But there are legitimate cases where Docker is not viable:

  • Enterprise macOS teams: Docker Desktop requires a paid subscription for organizations with 250+ employees or $10M+ revenue. QEMU is free.
  • CI/CD and containerized deployments: Running Docker-inside-Docker is fragile and requires privileged containers or socket mounting (#3028). QEMU microVMs need no daemon.
  • Stronger isolation needs: Containers share the host kernel. A kernel exploit in a container can compromise the host. MicroVMs run their own kernel with hardware virtualization.
  • Restricted sandbox is too restricted: The current fallback blocks many standard modules (os, sys, subprocess, tempfile, etc.), making it impractical for real data analysis or file processing tasks.

About exec-sandbox

  • GitHub: dualeai/exec-sandbox -- Apache-2.0 license
  • PyPI: exec-sandbox
  • How it works: Each execution boots a lightweight QEMU microVM (or grabs one from a warm pool in 1-2ms), runs code via a Rust guest-agent, streams stdout/stderr back, and destroys the VM. No state persists between executions.
  • Platforms: macOS (HVF) + Linux (KVM)
  • Languages: Python, JavaScript, RAW
  • Features: Package installation with snapshot caching, file I/O, streaming output, network domain filtering, port forwarding, sessions for stateful multi-step workflows

extent analysis

Fix Plan

To integrate exec-sandbox as a new execution strategy, follow these steps:

  • Install exec-sandbox using pip: pip install exec-sandbox
  • Implement a custom ExecSandboxTool class that inherits from BaseTool
  • Define the ExecSandboxInput model to handle code and package installation
  • Implement the _run method to execute code in a QEMU microVM using exec-sandbox

Example Code

from crewai import Agent, Task, Crew
from crewai.tools import BaseTool
from pydantic import BaseModel, Field
from exec_sandbox import Scheduler

class ExecSandboxInput(BaseModel):
    code: str = Field(..., description="Python 3 code to execute")
    packages: list[str] = Field(
        default_factory=list,
        description="pip packages to install before execution (e.g. ['pandas==2.2.0'])",
    )

class ExecSandboxTool(BaseTool):
    name: str = "Secure Code Interpreter"
    description: str = (
        "Executes Python 3 code in a hardware-isolated VM sandbox. "
        "Each execution gets a fresh VM that is destroyed after. "
        "Use for computations, data analysis, or any code that needs "
        "to run securely. Always end with a print() statement for output."
    )
    args_schema: type[BaseModel] = ExecSandboxInput

    def _run(self, code: str, packages: list[str] | None = None) -> str:
        import asyncio

        async def _execute():
            async with Scheduler() as scheduler:
                result = await scheduler.run(
                    code=code,
                    language="python",
                    packages=packages or [],
                    timeout_seconds=60,
                )
                if result.exit_code != 0:
                    return f"Error (exit code {result.exit_code}):\n{result.stderr}"
                return result.stdout

        return asyncio.run(_execute())

# Usage with a CrewAI agent
agent = Agent(
    role="Data Analyst",
    goal="Analyze data and produce insights",
    backstory="Expert data analyst with strong Python skills",
    tools=[ExecSandboxTool()],
)

Verification

To verify the fix, create a test agent with the ExecSandboxTool and execute a sample code snippet. Check the output to ensure it matches the expected result.

Extra Tips

  • Ensure you have the necessary dependencies installed, including exec-sandbox and qemu.
  • Configure the ExecSandboxTool to use the correct language and package installation settings.
  • Test the tool with different code snippets and edge cases to ensure it works as expected.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING