openclaw - 💡(How to fix) Fix [Feature]: Pluggable subagent execution backends and resource profiles (Kubernetes, containers, remote workers)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Add a pluggable remote execution backend for spawned subagents/ACP sessions so OpenClaw can run agent workers in Kubernetes or other resource-isolated compute environments, with explicit resource profiles selected per spawn, binding, or agent config.

This is broader than "Sandboxing + ACP". Kubernetes is one strong implementation target, but the product-level feature is: agent/session spawn placement and resource selection.

Root Cause

This would make subagent/ACP execution more production-grade:

  • isolate heavy or risky agent work from the gateway
  • make resource usage explicit and enforceable
  • reduce orphan process problems by moving lifecycle into backend-managed workers
  • improve observability with backend-specific IDs, logs, events, and traces
  • enable team/shared deployments where many users spawn agents concurrently
  • provide a foundation for GPU/build/test/large-repo profiles

Code Example

{
  "agents": {
    "executionBackends": {
      "local": { "type": "process" },
      "docker": { "type": "container" },
      "k8s": {
        "type": "kubernetes",
        "namespace": "openclaw-agents",
        "profiles": {
          "small": {
            "image": "ghcr.io/openclaw/agent-worker:VERSION",
            "resources": {
              "requests": { "cpu": "500m", "memory": "1Gi" },
              "limits": { "memory": "2Gi" }
            }
          },
          "large-build": {
            "image": "ghcr.io/openclaw/agent-worker:VERSION",
            "resources": {
              "requests": { "cpu": "4", "memory": "8Gi" },
              "limits": { "cpu": "8", "memory": "16Gi" }
            }
          }
        }
      }
    }
  }
}

---

{
  "runtime": "acp",
  "agentId": "codex",
  "execution": {
    "backend": "k8s",
    "profile": "large-build"
  },
  "message": "Run the full test suite and fix failures."
}
RAW_BUFFERClick to expand / collapse

Summary

Add a pluggable remote execution backend for spawned subagents/ACP sessions so OpenClaw can run agent workers in Kubernetes or other resource-isolated compute environments, with explicit resource profiles selected per spawn, binding, or agent config.

This is broader than "Sandboxing + ACP". Kubernetes is one strong implementation target, but the product-level feature is: agent/session spawn placement and resource selection.

Problem

Today, sessions_spawn and ACP-backed agent runs are mostly tied to the local gateway host/runtime. This makes it hard to:

  • isolate risky or heavy coding-agent work from the gateway process
  • choose CPU/memory/GPU resources per task
  • run many long-running agents without exhausting the host
  • route different agent types to different execution environments
  • observe and clean up remote worker lifecycle consistently
  • support teams where the gateway is lightweight but workers should run on dedicated infrastructure

A user may want to say, effectively:

Spawn Codex in a 4 CPU / 8 GiB worker with network egress disabled.

or:

Spawn Gemini/OpenCode in a cheap small worker unless the task asks for build/test, then use a larger profile.

or:

Run this ACP session in the team's Kubernetes namespace, not on the gateway host.

Proposed capability

Introduce a generic spawn execution backend/profile layer for subagents and ACP sessions.

Conceptually:

{
  "agents": {
    "executionBackends": {
      "local": { "type": "process" },
      "docker": { "type": "container" },
      "k8s": {
        "type": "kubernetes",
        "namespace": "openclaw-agents",
        "profiles": {
          "small": {
            "image": "ghcr.io/openclaw/agent-worker:VERSION",
            "resources": {
              "requests": { "cpu": "500m", "memory": "1Gi" },
              "limits": { "memory": "2Gi" }
            }
          },
          "large-build": {
            "image": "ghcr.io/openclaw/agent-worker:VERSION",
            "resources": {
              "requests": { "cpu": "4", "memory": "8Gi" },
              "limits": { "cpu": "8", "memory": "16Gi" }
            }
          }
        }
      }
    }
  }
}

Then sessions_spawn / bindings / agent defaults could select a backend and profile:

{
  "runtime": "acp",
  "agentId": "codex",
  "execution": {
    "backend": "k8s",
    "profile": "large-build"
  },
  "message": "Run the full test suite and fix failures."
}

Kubernetes implementation sketch

A Kubernetes backend would:

  1. resolve the requested profile and policy
  2. build a worker Pod/Job manifest
  3. validate it with Kubernetes API dryRun=All
  4. create the worker
  5. wait for readiness
  6. stream agent/ACP events back over a defined worker protocol
  7. expose status as namespace/podName, phase, container status, logs/events
  8. delete/TTL/sweep workers after close, timeout, or parent reset

The worker image should implement a narrow OpenClaw worker contract such as:

  • GET /healthz
  • POST /turn streaming NDJSON events
  • POST /cancel
  • optional GET /status / GET /logs metadata

Resource and policy selection

Profiles should support at least:

  • CPU/memory requests and limits
  • optional GPU/runtime class/node selector/tolerations
  • image and command/args
  • env/secret references
  • network policy class or egress mode
  • workspace volume strategy
  • timeout/TTL
  • allowed agent IDs
  • max concurrent workers per profile/backend/channel/user

Selection could happen from:

  • explicit sessions_spawn.execution
  • agent defaults
  • ACP binding defaults
  • channel/user policy
  • task classification heuristics later

Alternatives beyond Kubernetes

This should not be Kubernetes-only. Possible backend types:

  • local process: current behavior; simplest and fastest for dev
  • Docker/Podman container: good single-host isolation without a cluster
  • Kubernetes Pod/Job: best for teams, quotas, autoscaling, node pools, GPU, namespace isolation
  • Nomad task: simpler ops for some infra teams; good resource scheduling without full Kubernetes
  • ECS/Fargate / Cloud Run Jobs / Azure Container Apps Jobs: managed container workers without operating a cluster
  • Firecracker/microVM: stronger isolation for untrusted code execution
  • remote SSH worker pool: pragmatic for existing build machines
  • CI runner backend: GitHub Actions/GitLab runners for bursty repo tasks, though latency and interactivity are weaker
  • devcontainer/Codespaces-like backend: good for repo-aware coding agents with prebuilt environments

The OpenClaw API should expose a common backend/profile abstraction so Kubernetes can be one implementation, not the only design.

Why this matters

This would make subagent/ACP execution more production-grade:

  • isolate heavy or risky agent work from the gateway
  • make resource usage explicit and enforceable
  • reduce orphan process problems by moving lifecycle into backend-managed workers
  • improve observability with backend-specific IDs, logs, events, and traces
  • enable team/shared deployments where many users spawn agents concurrently
  • provide a foundation for GPU/build/test/large-repo profiles

Relationship to existing issues

This is related to, but distinct from, sandboxing and ACP lifecycle issues:

  • #45841 discusses Sandboxing + ACP, but this proposal is broader than sandbox compatibility.
  • #68916 and #74684 show the need for better spawned-agent lifecycle cleanup and observability.
  • #68204 would be important for tracing parent session → spawned worker → backend resource.
  • #79560 suggests policy/rate limits are needed when spawning agents from non-interactive channels.

Acceptance criteria / possible first milestone

A first milestone could be Kubernetes-only but shaped as a generic backend abstraction:

  • config schema for execution backends and profiles
  • sessions_spawn can select backend/profile explicitly
  • Kubernetes backend creates a worker Pod from a profile
  • dry-run validation and RBAC/doctor checks exist
  • status exposes namespace/podName and readiness/failure reason
  • close/reset deletes the worker and sweeper handles orphans
  • minimal worker image supports /healthz, /turn, /cancel
  • targeted tests cover profile resolution, manifest generation, lifecycle, and cleanup

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Feature]: Pluggable subagent execution backends and resource profiles (Kubernetes, containers, remote workers)