crewai - ✅(Solved) Fix Add lightweight OS-level sandboxing for tool execution via sandlock [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
crewAIInc/crewAI#5150Fetched 2026-04-08 01:44:50
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
cross-referenced ×1referenced ×1

Fix Action

Fixed

PR fix notes

PR #5151: feat: add sandlock as lightweight OS-level sandboxing backend for CodeInterpreterTool

Description (problem / solution / changelog)

Summary

Implements #5150 — adds sandlock as a new lightweight execution backend for CodeInterpreterTool. Sandlock uses Linux kernel features (Landlock + seccomp-bpf) to provide process-level isolation without requiring Docker.

Key changes:

  • New execution_backend parameter on CodeInterpreterTool: "auto" (default), "docker", "sandlock", "unsafe"
  • New sandbox configuration fields: sandbox_fs_read, sandbox_fs_write, sandbox_max_memory_mb, sandbox_max_processes, sandbox_timeout
  • run_code_in_sandlock() method with library installation, policy building, and error handling
  • Auto mode fallback order: Docker → Sandlock → RuntimeError
  • sandlock>=0.2.0 added as an optional dependency group in pyproject.toml
  • 18 new tests covering routing, availability checks, policy building, error handling, timeouts, and fallback behavior

Review & Testing Checklist for Human

  • Verify sandlock API contract matches implementation. All tests mock Sandbox and Policy — the real sandlock API (Sandbox(policy).run(cmd, timeout=...) returning an object with .stdout, .stderr, .returncode) has not been integration-tested. The hasattr() guards on lines 530-532 suggest uncertainty about the return type. Confirm the actual sandlock>=0.2.0 package exposes this interface.
  • Review security of library installation outside sandbox. run_code_in_sandlock installs pip packages on the host filesystem (via subprocess.run) before entering the sandbox. Compare with Docker backend which installs inside the container. Untrusted library names get passed to pip install --target.
  • Review _build_sandlock_policy filesystem permissions. The policy adds all sys.path entries as readable paths and uses exec(open('{code_file}').read()) for execution. Verify this doesn't expose more of the host filesystem than intended.
  • Confirm sandlock package maturity/trust. sandlock v0.2.0 is a relatively new/small package. Evaluate whether it meets the bar for a crewAI dependency (even as optional).
  • Test plan: Install sandlock on a Linux machine, run CodeInterpreterTool(execution_backend="sandlock") with real code, and verify isolation (e.g., confirm filesystem writes outside allowed paths are blocked, memory limits enforced).

Notes

  • The existing test_docker_unavailable_raises_error was renamed to test_docker_and_sandlock_unavailable_raises_error and its assertions updated to reflect the new fallback chain.
  • sandlock is Linux-only (requires kernel 5.13+). The implementation gracefully rejects non-Linux platforms with a helpful error message.
  • The unsafe_mode=True flag continues to work as before for backward compatibility.

Updates since last revision

  • Fixed mypy type-checker errors across all Python versions (3.10–3.13): the Policy import uses # type: ignore[import-untyped] (matching the existing docker import pattern), and the Sandbox import needs no annotation since mypy caches the module after the first import.

Link to Devin session: https://app.devin.ai/sessions/cd127bbc88684d649b90b1272c9a520b

Changed files

  • lib/crewai-tools/pyproject.toml (modified, +3/-0)
  • lib/crewai-tools/src/crewai_tools/tools/code_interpreter_tool/code_interpreter_tool.py (modified, +264/-18)
  • lib/crewai-tools/tests/tools/test_code_interpreter_tool.py (modified, +362/-6)
  • lib/crewai-tools/tool.specs.json (modified, +62/-1)

Code Example

from crewai_tools import CodeInterpreterTool

# Sandlock as execution backend (between Docker and unsafe mode)
tool = CodeInterpreterTool(
    execution_backend="sandlock",
    sandbox_fs_read=["/usr/lib/python3", "/workspace"],
    sandbox_fs_write=["/workspace/output"],
    sandbox_max_memory_mb=512,
)
RAW_BUFFERClick to expand / collapse

Problem

CrewAI's code execution tools have multiple unsandboxed pathways:

  • CodeInterpreterTool unsafe mode: exec() and os.system(f"pip install {library}") with no restrictions
  • SandboxPython: the code itself documents that it "does NOT provide real security isolation and is vulnerable to sandbox escape attacks via Python object introspection"
  • Template eval(): crewai create ships a Calculator example using eval(expression) on unsanitized LLM input (#5056)
  • No framework-level sandboxing: tools run in the host process by default

Docker isolation via CodeInterpreterTool is available but optional and adds ~200ms+ startup overhead per execution. When Docker is unavailable, there is no fallback with real isolation.

Proposal

Add sandlock as a lightweight sandboxing backend for tool execution. Sandlock is a Linux process sandbox using Landlock, seccomp-bpf, and user namespaces.

What sandlock provides

LayerProtection
LandlockFilesystem path whitelisting (read-only / read-write), network domain + port restrictions
seccomp-bpfSyscall filtering at kernel level: blocks ptrace, mount, unshare, kexec_load, bpf, etc.
User namespacesPrivilege escalation prevention without root
Resource limitsMemory, process count, CPU, open file caps (no cgroups needed)

Why this fits CrewAI

  • ~20ms startup vs ~200ms for Docker. Viable as a default, not just an opt-in.
  • No root, no daemon. Unlike Docker, just pip install sandlock.
  • Kernel-enforced. Python object introspection escapes (the acknowledged weakness of SandboxPython) are irrelevant when seccomp blocks dangerous syscalls at the kernel level. Even if eval() reaches os.system(), the syscall is denied.
  • Self-hosted. No cloud account needed.
  • Fallback when Docker is unavailable. Sandlock could serve as the middle ground between "no isolation" and "full Docker container."
  • Linux primary. Works on CrewAI's Linux deployments; macOS/Windows users can continue using Docker.

Integration points

  1. CodeInterpreterTool: add sandlock as an execution backend alongside docker and unsafe
  2. Tool execution framework: wrap tool calls in a sandbox with per-tool filesystem + network policy
  3. CLI scaffolding: default new projects to sandboxed execution instead of shipping eval() examples

Example usage

from crewai_tools import CodeInterpreterTool

# Sandlock as execution backend (between Docker and unsafe mode)
tool = CodeInterpreterTool(
    execution_backend="sandlock",
    sandbox_fs_read=["/usr/lib/python3", "/workspace"],
    sandbox_fs_write=["/workspace/output"],
    sandbox_max_memory_mb=512,
)

Relation to existing issues

  • #5056 (eval() in templates): kernel-level syscall filtering limits blast radius even if eval reaches dangerous functions
  • #4516 (command injection + sandbox escape): sandlock makes Python introspection escapes irrelevant
  • #4593 (no fail-closed defaults): sandlock enables a secure default that doesn't require Docker

Alternatives considered

  • Hardening SandboxPython: acknowledged by maintainers as not viable (Python introspection is too powerful)
  • Docker-only: adds overhead and complexity; not available in all environments
  • firejail: requires root or setuid binary
  • bubblewrap (bwrap): lower-level, no Python API, no resource limits without cgroups

References

extent analysis

Fix Plan

To address the issue of unsandboxed pathways in CrewAI's code execution tools, we will integrate sandlock as a lightweight sandboxing backend. Here are the steps:

  • Install sandlock using pip: pip install sandlock
  • Update CodeInterpreterTool to use sandlock as an execution backend:
from crewai_tools import CodeInterpreterTool

tool = CodeInterpreterTool(
    execution_backend="sandlock",
    sandbox_fs_read=["/usr/lib/python3", "/workspace"],
    sandbox_fs_write=["/workspace/output"],
    sandbox_max_memory_mb=512,
)
  • Wrap tool calls in a sandbox with per-tool filesystem and network policy:
def execute_tool(tool, code):
    # Create a sandbox with the specified policy
    sandbox = Sandlock(
        fs_read=["/usr/lib/python3", "/workspace"],
        fs_write=["/workspace/output"],
        max_memory_mb=512,
    )
    # Execute the tool within the sandbox
    sandbox.execute(tool, code)
  • Update the CLI scaffolding to default new projects to sandboxed execution:
def create_project(project_name):
    # Create a new project with sandboxed execution
    project = Project(
        name=project_name,
        execution_backend="sandlock",
        sandbox_fs_read=["/usr/lib/python3", "/workspace"],
        sandbox_fs_write=["/workspace/output"],
        sandbox_max_memory_mb=512,
    )
    # ...

Verification

To verify that the fix worked, you can test the execution of tools within the sandbox:

  • Create a new project with sandboxed execution
  • Execute a tool within the sandbox
  • Verify that the tool is executed with the specified filesystem and network policy
  • Verify that the tool is limited to the specified memory and resource limits

Extra Tips

  • Make sure to test the integration of sandlock with different tools and scenarios to ensure that it works as expected.
  • Consider adding additional logging and monitoring to detect any potential issues with the sandboxing.
  • Keep in mind that sandlock is a Linux-specific solution, so you may need to use a different sandboxing solution for macOS and Windows deployments.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING