hermes - ✅(Solved) Fix [Bug] Gateway accumulates zombie processes from MCP servers and subprocess calls [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#15012Fetched 2026-04-25 06:25:12
View on GitHub
Comments
1
Participants
2
Timeline
10
Reactions
0
Author
Timeline (top)
labeled ×4referenced ×3closed ×1commented ×1

The Hermes gateway (running as PID 1 in Docker) accumulates zombie processes over time from:

  • MCP server processes (gbrain, bun)
  • Git operations
  • Browser automation subprocesses
  • Shell pipe commands (head, etc.)

Root Cause

When running as PID 1, the gateway does not handle SIGCHLD signals to reap terminated child processes. In standard Unix, orphan processes are reparented to PID 1, which is expected to call wait() on them.

Fix Action

Fixed

PR fix notes

PR #15116: fix(docker): reap orphaned subprocesses via tini as PID 1

Description (problem / solution / changelog)

Closes #15012.

What this PR does

Installs tini in the container image and routes ENTRYPOINT through /usr/bin/tini -g -- /opt/hermes/docker/entrypoint.sh.

Without a PID-1 init, orphans reparented to hermes (MCP stdio servers, git, bun, browser daemons) never get wait()-ed on and accumulate as zombies under PID 1. Long-running gateway containers eventually exhaust the PID table ("fork: cannot allocate memory").

Why tini specifically

  • Standard container init — same pattern Docker's --init flag and Kubernetes's pause container use
  • Shipped in Debian stable (tini 0.19.0-1)
  • Transparent to hermes: handles SIGCHLD, reaps orphans, forwards SIGTERM/SIGINT to the entrypoint so existing graceful-shutdown handlers still run
  • -g sends signals to the whole process group so docker stop cleanly terminates hermes AND its descendants

Validation

E2E with a minimal reproducer image:

ImageSpawn 5 orphans reparenting to PID 1ZOMBIE_COUNT after 1.5s
Without tini (current main)5 orphans created5 zombies
With tini -g -- (this PR)5 orphans created0 zombies

Unit tests (tests/tools/test_dockerfile_pid1_reaping.py): Two contract tests that assert the Dockerfile (a) installs some init (tini, dumb-init, or catatonit), and (b) the last ENTRYPOINT line routes through it. Accepts any standard init so it's not a change-detector on tini specifically. Verified both tests fail on pre-fix Dockerfile, pass on post-fix.

What I considered and rejected

  • signal.signal(SIGCHLD, SIG_IGN) in Python — race-prone with Python's own subprocess wait(), doesn't cover shell entrypoint stage
  • Telling users to pass --init on docker run — requires user-side discipline, doesn't cover compose/k8s

Baking tini into the image is the right layer — applies to every deployment path automatically.

Changed files

  • Dockerfile (modified, +4/-2)
  • tests/tools/test_dockerfile_pid1_reaping.py (added, +78/-0)

Code Example

$ ps aux | awk '$8 ~ /Z/ {print}'
hermes      1689  0.0  0.0      0     0 ?        Zs   14:40   0:00 [agent-browser-l] <defunct>
hermes      1902  0.0  0.0      0     0 ?        Zs   14:50   0:00 [git] <defunct>
hermes      1984  0.0  0.0      0     0 ?        Zs   14:52   0:00 [git] <defunct>
hermes      1988  0.0  0.0      0     0 ?        Zs   14:52   0:00 [git] <defunct>
hermes      2861  0.0  0.0      0     0 ?        Z    14:59   0:00 [gbrain] <defunct>
hermes      2862  0.0  0.0      0     0 ?        Z    14:59   0:00 [head] <defunct>
hermes      2863  0.0  0.0      0     0 ?        Z    14:59   0:01 [bun] <defunct>
...

---

ENTRYPOINT ["/sbin/tini", "--"]
   CMD ["hermes", "gateway", "run"]
RAW_BUFFERClick to expand / collapse

Description

The Hermes gateway (running as PID 1 in Docker) accumulates zombie processes over time from:

  • MCP server processes (gbrain, bun)
  • Git operations
  • Browser automation subprocesses
  • Shell pipe commands (head, etc.)

Evidence

$ ps aux | awk '$8 ~ /Z/ {print}'
hermes      1689  0.0  0.0      0     0 ?        Zs   14:40   0:00 [agent-browser-l] <defunct>
hermes      1902  0.0  0.0      0     0 ?        Zs   14:50   0:00 [git] <defunct>
hermes      1984  0.0  0.0      0     0 ?        Zs   14:52   0:00 [git] <defunct>
hermes      1988  0.0  0.0      0     0 ?        Zs   14:52   0:00 [git] <defunct>
hermes      2861  0.0  0.0      0     0 ?        Z    14:59   0:00 [gbrain] <defunct>
hermes      2862  0.0  0.0      0     0 ?        Z    14:59   0:00 [head] <defunct>
hermes      2863  0.0  0.0      0     0 ?        Z    14:59   0:01 [bun] <defunct>
...

All zombies have PPID=1 (the gateway process), indicating the gateway is not reaping child processes.

Root Cause

When running as PID 1, the gateway does not handle SIGCHLD signals to reap terminated child processes. In standard Unix, orphan processes are reparented to PID 1, which is expected to call wait() on them.

Proposed Solutions

  1. In gateway code: Add signal.signal(signal.SIGCHLD, signal.SIG_IGN) to auto-reap zombies, or implement a proper SIGCHLD handler that calls waitpid(-1, WNOHANG).

  2. In Docker: Use --init flag or add tini as PID 1 to handle signal forwarding and zombie reaping.

  3. In Dockerfile: Consider using tini as the entrypoint:

    ENTRYPOINT ["/sbin/tini", "--"]
    CMD ["hermes", "gateway", "run"]

Environment

  • Hermes version: v0.11.0
  • Deployment: Docker
  • OS: Linux (Debian-based)

Impact

Zombie processes don't consume CPU/memory, but they do occupy PID slots. Long-running containers may eventually exhaust available PIDs, causing "fork: cannot allocate memory" errors.

extent analysis

TL;DR

The most likely fix is to modify the Docker configuration to use an init system like tini to handle signal forwarding and zombie reaping.

Guidance

  • Consider adding --init flag to the Docker run command to enable init system for signal handling.
  • Alternatively, modify the Dockerfile to use tini as the entrypoint, as shown in the proposed solutions.
  • Verify that the chosen solution effectively reaps zombie processes by monitoring the output of ps aux | awk '$8 ~ /Z/ {print}' after implementation.
  • Ensure the Hermes gateway code does not interfere with the init system's signal handling.

Example

ENTRYPOINT ["/sbin/tini", "--"]
CMD ["hermes", "gateway", "run"]

This example demonstrates how to use tini as the entrypoint in a Dockerfile.

Notes

The proposed solutions assume a Debian-based Linux OS and Docker deployment. The effectiveness of these solutions may vary depending on the specific environment and Hermes version.

Recommendation

Apply the workaround by using tini as the entrypoint in the Dockerfile, as it provides a straightforward and effective solution for handling zombie processes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix [Bug] Gateway accumulates zombie processes from MCP servers and subprocess calls [1 pull requests, 1 comments, 2 participants]