openclaw - 💡(How to fix) Fix [Bug]: Every exec tool call fails with EPERM chmod '/home/openclaw/.openclaw' on Kubernetes with fsGroup-mounted PVC (regressed in 2026.5.12 via #77907)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

On 2026.5.12, every exec tool call fails with EPERM: operation not permitted, chmod '/home/openclaw/.openclaw' on Kubernetes with an fsGroup-mounted PVC; regressed in b971ebaaab / PR #77907 which added an unconditional chmodSync(stateDir, 0o700) + throw to src/infra/exec-approvals.ts:ensureDir between 2026.4.14 and 2026.5.12

Error Message

try { fs.chmodSync(dir, 0o700);

  • } catch (err) {
  • if (process.platform !== "win32") {
  •  throw err;
  • }
  • } catch (err) {
  • if (process.platform === "win32") {
  •  return dir;
  • }
  • // Cosmetic chmod: the directory may live on a runtime-managed mount
  • // (Kubernetes fsGroup, Fly Machines firecracker, etc.) where the
  • // container user is not the owner. Tolerate filesystem-permission
  • // errors only; surface anything else. Mirrors the precedent set by
  • // PR #73341 for acpx.
  • const code = (err as NodeJS.ErrnoException)?.code;
  • if (
  •  code !== "EPERM" &&
  •  code !== "EACCES" &&
  •  code !== "EROFS" &&
  •  code !== "ENOSYS" &&
  •  code !== "ENOTSUP"
  • ) {
  •  throw err;
  • } } return dir;

Root Cause

On 2026.5.12, every exec tool call fails with EPERM: operation not permitted, chmod '/home/openclaw/.openclaw' on Kubernetes with an fsGroup-mounted PVC; regressed in b971ebaaab / PR #77907 which added an unconditional chmodSync(stateDir, 0o700) + throw to src/infra/exec-approvals.ts:ensureDir between 2026.4.14 and 2026.5.12

Fix Action

Fix / Workaround

N/A — failure occurs before any LLM request is made. The pre-write chmodSync(stateDir, 0o700) in ensureDir throws synchronously, so no outbound request to any provider is dispatched for the failing exec

Downstream workaround (undesirable in production): adding CAP_FOWNER to the container security context lets chmod() succeed:

Code Example

[tools] exec failed: EPERM: operation not permitted, chmod '/home/openclaw/.openclaw' raw_params={"command":"ls -la ~/.openclaw/workspace/skills/"}
[tools] exec failed: EPERM: operation not permitted, chmod '/home/openclaw/.openclaw' raw_params={"command":"echo test","workdir":"/tmp"}
[tools] exec failed: EPERM: operation not permitted, chmod '/home/openclaw/.openclaw' raw_params={"command":"echo test","host":"gateway"}

---

$ id
uid=1000(node) gid=1000(node) groups=1000(node)

$ stat -c '%u %g %a %n' /home/openclaw/.openclaw
0 1000 2775 /home/openclaw/.openclaw    # kubelet-applied fsGroup, setgid

$ grep -E '^Cap(Bnd|Eff):' /proc/self/status
CapBnd:	0000000000000000
CapEff:	0000000000000000

---

function ensureDir(filePath: string) {
  const dir = path.dirname(filePath);
  assertNoExecApprovalsSymlinkParents(dir, resolveRequiredHomeDir());
  fs.mkdirSync(dir, { recursive: true });
  const dirStat = fs.lstatSync(dir);
  if (!dirStat.isDirectory() || dirStat.isSymbolicLink()) {
    throw new Error(`Refusing to use unsafe exec approvals directory: ${dir}`);
  }
  try {
    fs.chmodSync(dir, 0o700);
  } catch (err) {
    if (process.platform !== "win32") {
      throw err;
    }
  }
  return dir;
}

---

try {
     fs.chmodSync(dir, 0o700);
-  } catch (err) {
-    if (process.platform !== "win32") {
-      throw err;
-    }
+  } catch (err) {
+    if (process.platform === "win32") {
+      return dir;
+    }
+    // Cosmetic chmod: the directory may live on a runtime-managed mount
+    // (Kubernetes fsGroup, Fly Machines firecracker, etc.) where the
+    // container user is not the owner. Tolerate filesystem-permission
+    // errors only; surface anything else. Mirrors the precedent set by
+    // PR #73341 for `acpx`.
+    const code = (err as NodeJS.ErrnoException)?.code;
+    if (
+      code !== "EPERM" &&
+      code !== "EACCES" &&
+      code !== "EROFS" &&
+      code !== "ENOSYS" &&
+      code !== "ENOTSUP"
+    ) {
+      throw err;
+    }
   }
   return dir;

---

securityContext:
  capabilities:
    drop: ["ALL"]
    add: ["FOWNER"]
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

On 2026.5.12, every exec tool call fails with EPERM: operation not permitted, chmod '/home/openclaw/.openclaw' on Kubernetes with an fsGroup-mounted PVC; regressed in b971ebaaab / PR #77907 which added an unconditional chmodSync(stateDir, 0o700) + throw to src/infra/exec-approvals.ts:ensureDir between 2026.4.14 and 2026.5.12

Steps to reproduce

  1. Deploy OpenClaw 2026.5.12 to Kubernetes with runAsUser: 1000, runAsNonRoot: true, fsGroup: 1000, capabilities.drop: [ALL], and a PVC mounted at /home/openclaw/.openclaw.
  2. Wait for the container to become Ready.
  3. Trigger any agent action that uses the exec tool (e.g. echo test)

Expected behavior

On 2026.4.14 (last known good), every exec call ran the user command and returned its stdout/stderr; ~/.openclaw/exec-approvals.json was written without chmodding the state-dir root

Actual behavior

Every exec attempt fails before the user command runs:

[tools] exec failed: EPERM: operation not permitted, chmod '/home/openclaw/.openclaw' raw_params={"command":"ls -la ~/.openclaw/workspace/skills/"}
[tools] exec failed: EPERM: operation not permitted, chmod '/home/openclaw/.openclaw' raw_params={"command":"echo test","workdir":"/tmp"}
[tools] exec failed: EPERM: operation not permitted, chmod '/home/openclaw/.openclaw' raw_params={"command":"echo test","host":"gateway"}

The chmod target is always /home/openclaw/.openclaw (state-dir root), regardless of workdir or host — confirming this is a pre-write step on the state-dir, not on the command's cwd or sandbox path. Non-exec actions (model calls, REST channel, cron persistence) keep working

OpenClaw version

2026.5.12

Operating system

Linux (Kubernetes pod, ghcr.io/openclaw/openclaw:2026.5.12 base image)

Install method

docker — ghcr.io/openclaw/openclaw:2026.5.12 deployed as a Kubernetes StatefulSet (single-pod, PVC-backed)

Model

N/A — exec tool failure is model-agnostic; it occurs in src/infra/exec-approvals.ts:ensureDir before any model interaction. Reproduced with openrouter/anthropic/claude-sonnet-4.5 and others; replacing the model does not change behavior

Provider / routing chain

N/A — failure occurs before any LLM request is made. The pre-write chmodSync(stateDir, 0o700) in ensureDir throws synchronously, so no outbound request to any provider is dispatched for the failing exec

Additional provider/model setup details

N/A — see "Model" and "Provider / routing chain" above; this regression is independent of model/provider configuration

Logs, screenshots, and evidence

Pod context (inside the failing container):

$ id
uid=1000(node) gid=1000(node) groups=1000(node)

$ stat -c '%u %g %a %n' /home/openclaw/.openclaw
0 1000 2775 /home/openclaw/.openclaw    # kubelet-applied fsGroup, setgid

$ grep -E '^Cap(Bnd|Eff):' /proc/self/status
CapBnd:	0000000000000000
CapEff:	0000000000000000

Pod security context (from the Kubernetes Pod spec):

  • runAsUser: 1000
  • runAsNonRoot: true
  • fsGroup: 1000
  • capabilities.drop: [ALL] (PSA restricted-friendly defaults)
  • seccompProfile: RuntimeDefault
  • PVC mounted at /home/openclaw/.openclaw

Root-cause code (src/infra/exec-approvals.ts, lines 266–282, current main):

function ensureDir(filePath: string) {
  const dir = path.dirname(filePath);
  assertNoExecApprovalsSymlinkParents(dir, resolveRequiredHomeDir());
  fs.mkdirSync(dir, { recursive: true });
  const dirStat = fs.lstatSync(dir);
  if (!dirStat.isDirectory() || dirStat.isSymbolicLink()) {
    throw new Error(`Refusing to use unsafe exec approvals directory: ${dir}`);
  }
  try {
    fs.chmodSync(dir, 0o700);
  } catch (err) {
    if (process.platform !== "win32") {
      throw err;
    }
  }
  return dir;
}

ensureDir is called from writeExecApprovalsRaw, which runs on every persistence of ~/.openclaw/exec-approvals.json (saveExecApprovals, restoreExecApprovalsSnapshot, recordAllowlistUse, addDurableCommandApproval, persistAllowAlwaysPatterns, ensureExecApprovals on first run). The chmodSync targets the directory (state-dir root), not the file. On Kubernetes with fsGroup, the mount root is owned by root:fsGroup; an unprivileged container lacks CAP_FOWNER, so chmod() returns EPERM, the throw err rethrows unconditionally on non-Windows, and the exec attempt dies.

Comparison with sibling code (in-repo precedents this codepath diverges from):

  • src/config/io.ts:tightenStateDirPermissionsIfNeeded — best-effort: chmod wrapped in try/catch without rethrow, comment "Best-effort hardening only; callers still need the config write to proceed."
  • extensions/acpx/codex-auth-bridge.ts — merged precedent (#73341): tolerant to EPERM/EACCES/EROFS/ENOSYS/ENOTSUP for cosmetic chmod.
  • src/infra/exec-approvals.ts:ensureDirdoes neither; this is the regression.

(Note: the (mode & 0o077) === 0 early-return from io.ts would NOT help here. With fsGroup the mode is 2775, so group/other bits are set, the mask check fails, and chmod is still attempted.)

Regression history (combined from git log --follow -S 'function ensureDir', git log -S 'fs.chmodSync(dir,', git blame -L 266,282):

  • efdb33c975 (2026-01-18, feat: add exec host approvals flow) — ensureDir introduced, body is just mkdirSync. No chmod.
  • 4bf94aa0d6 (2026-04-10, feat: add local exec-policy CLI (#64050)) — added assertNoSymlinkPathComponents and lstatSync boundary. Still no chmod.
  • b971ebaaab (2026-05-05, PR #77907 fix(exec-approvals): guard Windows rename fallback) — added chmodSync(dir, 0o700) + throw err on non-Windows. This is the regression commit.

So the regression entered the codebase between releases 2026.4.14 (last known good) and 2026.5.12 (first known bad)

Impact and severity

Affected: every OpenClaw deployment on Kubernetes (and likely Fly Machines, see #71205) that mounts ~/.openclaw from a runtime-provisioned volume where the mount root is not owned by the container's runtime user. Confirmed on managed Kubernetes with fsGroup-set PVC and a non-root container (uid=1000). Severity: High — completely blocks the exec tool. Every shell-level operation an agent attempts (build, file inspection, cron edits, diagnostics) fails before the command runs. Non-exec features (model calls, message delivery, cron persistence via the Node binding) continue to work, but any agent flow that depends on exec is dead. Frequency: Always — reproduces on 100% of exec invocations on the affected configuration; observed 12+ consecutive failures in 60 min on a single pod under normal usage. Consequence: All agent-driven shell work is unavailable on Kubernetes/Fly-style multi-uid runtimes after upgrading to 2026.5.12. The only safe downgrade target is 2026.4.14 (before regression commit b971ebaaab)

Additional information

Last known good version: 2026.4.14. First known bad version: 2026.5.12. Regression commit: b971ebaaab (PR #77907, 2026-05-05). Root-cause file:line: src/infra/exec-approvals.ts:266–282 (ensureDir).

Suggested fix (error-code whitelist, mirrors merged precedent #73341 for acpx):

   try {
     fs.chmodSync(dir, 0o700);
-  } catch (err) {
-    if (process.platform !== "win32") {
-      throw err;
-    }
+  } catch (err) {
+    if (process.platform === "win32") {
+      return dir;
+    }
+    // Cosmetic chmod: the directory may live on a runtime-managed mount
+    // (Kubernetes fsGroup, Fly Machines firecracker, etc.) where the
+    // container user is not the owner. Tolerate filesystem-permission
+    // errors only; surface anything else. Mirrors the precedent set by
+    // PR #73341 for `acpx`.
+    const code = (err as NodeJS.ErrnoException)?.code;
+    if (
+      code !== "EPERM" &&
+      code !== "EACCES" &&
+      code !== "EROFS" &&
+      code !== "ENOSYS" &&
+      code !== "ENOTSUP"
+    ) {
+      throw err;
+    }
   }
   return dir;

Downstream workaround (undesirable in production): adding CAP_FOWNER to the container security context lets chmod() succeed:

securityContext:
  capabilities:
    drop: ["ALL"]
    add: ["FOWNER"]

PSA restricted disallows non-default capabilities, so the pod no longer fits restricted namespaces, and it widens the cap surface inside the container. A code-level fix (whitelist / ownership-check / env-var opt-out) is the proper solution.

Related (cross-link only):

  • #71205 — tightenStateDirPermissionsIfNeeded on src/config/io.ts; same anti-pattern, different subsystem (Fly Machines repro). Proposes an OPENCLAW_SKIP_STATE_DIR_HARDEN=1 env-var opt-out — extending the same flag to gate exec-approvals.ts:ensureDir would centralize the fix.
  • #66747 — EPERM chmod on ~/.openclaw/tasks and ~/.openclaw/flows during task/task-flow registry restore; same anti-pattern, different subsystem.
  • #73341 (merged) — acpx plugin chmod tolerant of EPERM/EACCES/EROFS/ENOSYS/ENOTSUP; precedent for the suggested fix above

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

On 2026.4.14 (last known good), every exec call ran the user command and returned its stdout/stderr; ~/.openclaw/exec-approvals.json was written without chmodding the state-dir root

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Every exec tool call fails with EPERM chmod '/home/openclaw/.openclaw' on Kubernetes with fsGroup-mounted PVC (regressed in 2026.5.12 via #77907)