openclaw - 💡(How to fix) Fix Proposal: `lima-sandbox` — a hypervisor-isolated sandbox backend [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#69991Fetched 2026-04-23 07:30:38
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Participants

Error Message

  • Platform scope is narrow. macOS 13+ hosts (via Lima's vz driver) and Linux+KVM hosts only. No Windows host support. Intel and Apple Silicon both work, but no cross-architecture virtualization — Lima's documentation is explicit that Virtualization.framework doesn't run Intel guests on ARM or vice versa. lima-sandbox would refuse to register on unsupported hosts with a clear error. We'd plan to open with a design doc (primitives, config shape, Lima driver wrapping, save/restore exposure, platform detection, error model) before any implementation PR, if that matches your review cadence.

Code Example

agents.defaults.sandbox.lima = {
  image:     { source, ... },
  network:   { mode, allowedHosts, allowedPorts },
  storage:   { type: "persistent" | "ephemeral" },
  resources: { memoryMiB, cpus, diskGiB, ttlSeconds, swap },
  exec:      { user, workDir }
}
RAW_BUFFERClick to expand / collapse

Issue #12405 lists Lima as a desired sandbox backend alongside OrbStack and exe.dev. We'd like to offer the implementation work for Lima specifically, with the goal of landing it as a bundled extension alongside openshell.

We've been using Lima for sandbox workloads internally and it's solved a gap that none of docker, ssh, or openshell cleanly cover — hypervisor-level isolation for agent workloads where sharing the host kernel isn't acceptable. This proposal is about abstracting that experience into a generic plugin suitable for the broader OpenClaw community.


Why native VM isolation, not another container backend

Apple doesn't publish a formal threat model for Virtualization.framework, so this is an architectural argument rather than a certificated one. What's on the record:

  • Separate guest kernel. Each sandbox runs its own Linux kernel inside a VM. No namespace-based isolation; no shared host-kernel syscall surface. Host-kernel CVEs don't cross the boundary.

  • No third-party kernel extensions required. Apple's framework operates without kexts — a genuine attack-surface reduction versus historical Docker Desktop network paths.

  • Hardware-enforced memory isolation. Apple's Hypervisor framework supports IPA (intermediate physical address) memory granularity down to 4KB via stage-2 page tables. Memory separation is enforced in silicon, not by namespace bookkeeping.

  • Hardware-encrypted VM state files. From Apple's own WWDC23 session:

    "These files are hardware encrypted to provide the strongest possible guarantees. No other Mac or user account can read another's save file, or restore the virtual machine."

    The saveMachineStateTo: / restoreMachineStateFrom: APIs exist at the framework level. Lima's vz driver doesn't currently expose them to its Go API; wiring them through as part of lima-sandbox (either directly via the Apple bindings or by contributing upstream to lima-vm first) is part of the proposed scope.

  • Networking without setuid or root. Lima's vzNAT provides NAT with built-in DHCP and DNS and requires no network daemon — a security-surface reduction versus bridge+iptables stacks.

  • Third-party validation from Apple. The new Containerization framework (WWDC25) runs each Linux container inside its own lightweight VM on the same Virtualization.framework substrate. The pattern isn't speculative; Apple shipped it for themselves.

Honest trade-offs

  • Startup is seconds, not milliseconds. Right for long-lived or security-sensitive sandboxes; wrong for per-request ephemeral workloads. Container backends remain the correct choice for low-latency cases.
  • Memory footprint is heavier. Hundreds of MiB minimum per VM versus shared-kernel containers that can run in tens of MiB.
  • Platform scope is narrow. macOS 13+ hosts (via Lima's vz driver) and Linux+KVM hosts only. No Windows host support. Intel and Apple Silicon both work, but no cross-architecture virtualization — Lima's documentation is explicit that Virtualization.framework doesn't run Intel guests on ARM or vice versa. lima-sandbox would refuse to register on unsupported hosts with a clear error.
  • Image distribution is bring-your-own. The plugin would ship plumbing (Lima YAML config, cloud-init-compatible images), not bundled opinionated images. Users point at their own registry or the Lima project's catalog.

Shape of the configuration surface (primitives, not profiles)

We propose the plugin expose primitive knobs rather than named profiles. Concretely:

agents.defaults.sandbox.lima = {
  image:     { source, ... },
  network:   { mode, allowedHosts, allowedPorts },
  storage:   { type: "persistent" | "ephemeral" },
  resources: { memoryMiB, cpus, diskGiB, ttlSeconds, swap },
  exec:      { user, workDir }
}

No canonical profile: "airgapped" or "sealed-ephemeral" strings. The reasoning: profile names encode threat-model opinions (my "airgapped" isn't the same as your "airgapped"), and existing backends (docker, ssh, openshell) don't ship opinionated profiles either. Consumers compose their own domain-specific profiles from the primitives, which keeps the plugin's public contract narrow and versionable. README examples can document canonical composition patterns without committing the plugin to enforce those names.

Use cases where this matters

Generic to anyone building on OpenClaw; none of these are specific to one deployment:

  • Key material and cryptographic operations
  • Untrusted code execution (agent-authored scripts, attacker-controlled input)
  • Multi-tenant agent infrastructure where tenants shouldn't share a kernel
  • Compliance regimes that require hypervisor-level isolation
  • Security research / malware analysis sandboxing
  • Clean-slate development environments (one fresh VM per project)

Prior art we'd build on

  • openshell — already bundled, registers as a sandbox backend via the plugin SDK. This is the template lima-sandbox would follow.
  • trycua/cua's Lume project — wraps Virtualization.framework for macOS guests. OpenClaw's own docs recommend it for host-level deployment (running OpenClaw itself inside a macOS VM). Our proposal is the Linux-guest counterpart at the sandbox layer — different guest OS, different use case, complementary rather than overlapping.
  • Issue #67610 (Apr 16, 2026, open) — a user tried to register a custom backend "abc" via the plugin SDK and hit requireSandboxBackendFactory failing to locate it despite successful registration. We'd want to confirm this doesn't recur for lima-sandbox. If it does, fixing the core registration path is something we're willing to contribute alongside the backend itself.

Design decisions we'd want your input on before writing polish-grade code

  1. Plugin ID and package name. We're proposing lima-sandbox to match the pattern of lobster / open-prose / active-memory. Open to whatever naming convention you prefer — including an @openclaw/ scope if that's how you'd like bundled extensions to sit.

  2. Backend ID registration path. Existing behavior (per issue #67610) suggests agents.defaults.sandbox.backend accepts plugin-registered IDs at runtime via registerSandboxBackend(). Can you confirm that's the intended contract? If there's a core config-schema change also needed to bless "lima" explicitly, happy to submit that PR first.

  3. Where Lima-specific config lives. Docker and SSH put their per-backend config under agents.defaults.sandbox.<backend>.*; OpenShell uses plugins.entries.openshell.config. We have a mild preference for the sandbox-namespaced form (per-agent overrides via agents.list[].sandbox.lima.* matter for profile-per-agent workloads), but we'll follow house style.

  4. Image distribution expectations. We think the plugin should ship plumbing, not opinions — bring-your-own Lima YAML or point at the Lima project's image catalog. Confirm that's the right shape, or tell us if a small bundled default (Ubuntu-minimal, Alpine-virt) is expected.

  5. Testing. Does OpenClaw have a sandbox-backend test harness we can plug into? lima-sandbox needs macOS runners with virtualization enabled and/or Linux runners with nested KVM — happy to provide our own CI setup, but if there's house style we'd match it.

  6. Scope of the saveMachineStateTo: exposure. The Apple framework supports hardware-encrypted save/restore; Lima's vz driver doesn't expose it yet. We'd like to wire it through (either direct bindings or an upstream lima-vm contribution first). Is that in scope for your review, or should it land as a follow-up PR after the base backend?

  7. Profiles stay downstream. We're not proposing the plugin ship predefined profile names like "airgapped" or "sealed-ephemeral". Config is primitives only (network mode, storage type, resource limits, TTL). Consumers compose their own named profiles for their domain — docker works this way, openshell works this way, and we think lima-sandbox should too. README will show canonical composition examples as documentation, not as enforced config. Let us know if you'd prefer a different stance.

If bundling isn't the right fit

If the scope, maintenance burden, or platform coverage don't work for you, we'd publish lima-sandbox as a standalone package and link to it from the OpenClaw docs. Not urgent — happy to wait for the openshell patterns to settle or for broader discussion on #12405 before proceeding either way.

We'd plan to open with a design doc (primitives, config shape, Lima driver wrapping, save/restore exposure, platform detection, error model) before any implementation PR, if that matches your review cadence.

extent analysis

TL;DR

To integrate Lima as a sandbox backend, create a plugin that exposes primitive configuration knobs, allowing users to compose their own domain-specific profiles without shipping opinionated profiles.

Guidance

  • Review the proposed configuration surface, including primitives such as image, network, storage, resources, and exec, to ensure they meet the requirements for a generic plugin.
  • Discuss and confirm the plugin ID and package name, backend ID registration path, and where Lima-specific config lives to ensure consistency with OpenClaw's conventions.
  • Determine the scope of testing, including the use of a sandbox-backend test harness, to ensure thorough coverage of the plugin's functionality.
  • Decide on the exposure of saveMachineStateTo: and restoreMachineStateFrom: APIs, either through direct bindings or an upstream lima-vm contribution, to provide hardware-encrypted VM state files.

Example

No code example is provided as the issue focuses on design and configuration discussions.

Notes

The proposal is still in the design phase, and implementation details will depend on the outcome of the discussions. The plugin's maintenance burden, platform coverage, and scope will be crucial factors in determining whether it will be bundled with OpenClaw or published as a standalone package.

Recommendation

Apply a workaround by creating a design document that outlines the primitives, config shape, Lima driver wrapping, save/restore exposure, platform detection, and error model before proceeding with implementation. This will help ensure that all stakeholders are aligned on the plugin's requirements and functionality.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING