openclaw - 💡(How to fix) Fix [Bug]: `openclaw agent --timeout N` does not terminate warm sessions on timeout — causes permanent gateway hang

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

After upgrading from 2026.2.10 to 2026.4.25, openclaw agent invocations using --timeout leave orphaned warm sessions on the gateway. Orphaned sessions accumulate file descriptors and eventually cause the gateway to hang permanently. Reproduced in cron-scheduled production environment over 72h, confirmed via process inspection and FD counts.

Root Cause

After upgrading from 2026.2.10 to 2026.4.25, openclaw agent invocations using --timeout leave orphaned warm sessions on the gateway. Orphaned sessions accumulate file descriptors and eventually cause the gateway to hang permanently. Reproduced in cron-scheduled production environment over 72h, confirmed via process inspection and FD counts.

Fix Action

Fix / Workaround

High. Affects any deployment using cron + openclaw agent CLI with --timeout. Causes production outages. Workaround: replace CLI invocations with direct HTTP POST /v1/responses (avoids warm-session reuse, stops the leak). Evidence from single production deployment (Ubuntu 24.04, 6 agents). Not affected: direct HTTP API usage, interactive CLI with manual /exit.

Code Example

# Process tree at time of hang (anonymized)
$ ps -o pid,ppid,stat,etime,cmd -C openclaw
PID    PPID  STAT ELAPSED     CMD
12345  1     Ss   3-02:15:30  openclaw gateway
12400  1     S    2-18:44:12  openclaw agent --agent X (orphan)
12512  1     S    2-14:22:08  openclaw agent --agent Y (orphan)
12789  1     S    1-23:11:55  openclaw agent --agent X (orphan)

# FD count on gateway at time of hang
$ ls /proc/12345/fd | wc -l
347  # vs ~45 at clean startup

# Recovery requires full restart
$ systemctl --user restart openclaw-gateway
$ curl -sf http://127.0.0.1:18791/health
{"ok":true}
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

After upgrading from 2026.2.10 to 2026.4.25, openclaw agent invocations using --timeout leave orphaned warm sessions on the gateway. Orphaned sessions accumulate file descriptors and eventually cause the gateway to hang permanently. Reproduced in cron-scheduled production environment over 72h, confirmed via process inspection and FD counts.

Steps to reproduce

  1. Start OpenClaw v2026.4.25 gateway (systemd user unit, openclaw.json with at least 1 agent)
  2. Run in a loop: openclaw agent --agent myagent --timeout 5 --message "ping" (30 iterations, sleep 2 between)
  3. After loop completes, observe orphaned openclaw processes: ps -o pid,ppid,stat,etime,cmd -C openclaw
  4. Check FD count on gateway PID: ls /proc/<gateway_pid>/fd | wc -l (grows monotonically)
  5. In production (cron */5, --timeout 60), gateway becomes unresponsive after ~72h

Expected behavior

When --timeout N fires, the CLI process AND the warm session on the gateway should both be terminated. File descriptors should be released. No orphaned child processes should remain.

Actual behavior

The CLI process is killed but the warm session persists indefinitely on the gateway. Orphaned processes (PPID=1, adopted by systemd-user) accumulate with 15-25 open FDs each showing (deleted) inodes. After ~7 leaked sessions over 72h, gateway hangs on all HTTP endpoints including /health. Only full restart recovers. Gateway reports hooks ready (5/5) despite being hung — no self-diagnosis.

OpenClaw version

v2026.4.25

Operating system

Ubuntu 24.04 LTS

Install method

npm global

Model

openclaw/agent (multiple agents via openclaw.json)

Provider / routing chain

openclaw → gateway → local (systemd user unit, single instance, HTTP localhost:18791)

Additional provider/model setup details

Default route is openclaw gateway on localhost:18791. Six agents configured in openclaw.json (standard format, no custom extensions). Cron invokes openclaw agent CLI every 5 minutes with --timeout flag. Bug is in session lifecycle, not model/provider specific.

Logs, screenshots, and evidence

# Process tree at time of hang (anonymized)
$ ps -o pid,ppid,stat,etime,cmd -C openclaw
PID    PPID  STAT ELAPSED     CMD
12345  1     Ss   3-02:15:30  openclaw gateway
12400  1     S    2-18:44:12  openclaw agent --agent X (orphan)
12512  1     S    2-14:22:08  openclaw agent --agent Y (orphan)
12789  1     S    1-23:11:55  openclaw agent --agent X (orphan)

# FD count on gateway at time of hang
$ ls /proc/12345/fd | wc -l
347  # vs ~45 at clean startup

# Recovery requires full restart
$ systemctl --user restart openclaw-gateway
$ curl -sf http://127.0.0.1:18791/health
{"ok":true}

Impact and severity

High. Affects any deployment using cron + openclaw agent CLI with --timeout. Causes production outages. Workaround: replace CLI invocations with direct HTTP POST /v1/responses (avoids warm-session reuse, stops the leak). Evidence from single production deployment (Ubuntu 24.04, 6 agents). Not affected: direct HTTP API usage, interactive CLI with manual /exit.

Additional information

Suggested fix (for consideration): (1) CLI-side: --timeout handler could trigger the same cooperative session-close mechanism used by /exit, then hard-exit after grace period. (2) Gateway-side: session reaper detecting orphaned sessions (no heartbeat in N seconds) for defense-in-depth against SIGKILL/OOM/network drop.

Related issues: #33947 (MCP process orphan cleanup), #33979 (session manager feature request), #33949 (SSE hang no timeout), #40667 (MCP process leak after termination).

Reported by Gustavo Câmara (Bricker), with technical investigation by Adam OS BCOS (Chief of Staff agent) and review by Raphael (AI Chief Engineering agent).

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When --timeout N fires, the CLI process AND the warm session on the gateway should both be terminated. File descriptors should be released. No orphaned child processes should remain.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: `openclaw agent --timeout N` does not terminate warm sessions on timeout — causes permanent gateway hang