openclaw - 💡(How to fix) Fix Bug: Zombie processes accumulate in worker containers after child process exits

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

The openclaw-gateway process inside worker containers spawns child processes (worker-entrypoint and hiclaw) but does not properly handle SIGCHLD signals to reap exited child processes.

Fix Action

Workaround

Restarting worker containers clears the zombie processes temporarily.

Code Example

# Zombie process distribution (15 workers x 2 zombies each + 3 from manager)
$ ps aux | awk '$8=="Z"' | wc -l
32

# Each worker has 2 zombies:
$ docker exec hiclaw-worker-data-engineer ps aux | awk '$8=="Z"'
PID   PPID  STAT  COMMAND
143   0     Z     [worker-entrypoi]
199   0     Z     [hiclaw]

# Parent processes are openclaw-gateway:
$ ps -eo pid,ppid,stat,comm | awk '$3 ~ /^Z/ {print $2}' | sort | uniq -c
      2 163193  (openclaw-gateway)
      2 164766  (openclaw-gateway)
      ...
RAW_BUFFERClick to expand / collapse

Bug Description

After running HiClaw workers for a period of time, zombie processes accumulate in each worker container. Each worker container typically has 2 zombie processes:

  1. [worker-entrypoi] - the worker entrypoint script
  2. [hiclaw] - the hiclaw CLI process

Root Cause Analysis

The openclaw-gateway process inside worker containers spawns child processes (worker-entrypoint and hiclaw) but does not properly handle SIGCHLD signals to reap exited child processes.

Evidence

# Zombie process distribution (15 workers x 2 zombies each + 3 from manager)
$ ps aux | awk '$8=="Z"' | wc -l
32

# Each worker has 2 zombies:
$ docker exec hiclaw-worker-data-engineer ps aux | awk '$8=="Z"'
PID   PPID  STAT  COMMAND
143   0     Z     [worker-entrypoi]
199   0     Z     [hiclaw]

# Parent processes are openclaw-gateway:
$ ps -eo pid,ppid,stat,comm | awk '$3 ~ /^Z/ {print $2}' | sort | uniq -c
      2 163193  (openclaw-gateway)
      2 164766  (openclaw-gateway)
      ...

Impact

  • Zombie processes do not consume CPU or memory
  • They occupy entries in the process table (currently 32/4,194,304)
  • Long-term accumulation could eventually exhaust the process table
  • No functional impact at current levels

Workaround

Restarting worker containers clears the zombie processes temporarily.

Environment

  • HiClaw Version: latest (hiclaw-worker:latest)
  • OS: Ubuntu 26.04 LTS (host)
  • Kernel: 7.0.0-15-generic

Suggested Fix

The openclaw-gateway process should:

  1. Install a SIGCHLD signal handler
  2. Call waitpid(-1, NULL, WNOHANG) in a loop to reap all exited children
  3. Or use signal(SIGCHLD, SIG_IGN) to auto-reap children

Reproduction Steps

  1. Start HiClaw with multiple workers
  2. Wait for workers to execute tasks (which spawn child processes)
  3. Observe zombie processes accumulating in worker containers

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Bug: Zombie processes accumulate in worker containers after child process exits