openclaw - 💡(How to fix) Fix [Bug]: Codex ACP session archive self-ingestion can bloat ACP history, OOM codex-acp, and crash the host [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#58657Fetched 2026-04-08 01:59:40
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Codex ACP sessions can ingest enormous transcript payloads from OpenClaw's own archived session files, persist those payloads into ~/.acpx/sessions/*.stream*.ndjson, and then repeatedly replay/load the bloated session state until codex-acp consumes tens of GB of RSS and gets OOM-killed. Under enough pressure, this can destabilize or reboot the host.

This appears to be a session self-ingestion / replay-amplification failure mode, not just a generic memory leak.

The pattern I found is:

  1. A Codex ACP turn runs a broad recursive search over OpenClaw state, especially paths like /root/.openclaw or /root/.openclaw/agents/main/sessions
  2. That search matches archived OpenClaw transcript files containing prior tool results and large text blobs
  3. The huge search result is emitted as tool_call_update content and persisted into the ACP session stream under ~/.acpx/sessions/...stream*.ndjson
  4. Later session/load / replay of the same session forces codex-acp to reload or process an extremely large history
  5. codex-acp memory usage grows to tens of GB and is killed by the OOM killer; after repeated pressure, the whole machine may crash/reboot

I found multiple giant ACP session streams caused by this exact pattern, and the host I investigated also has journal evidence of repeated codex-acp OOM kills.

Error Message

  1. Exclude OpenClaw session archives/log stores from broad ACP search defaults, or warn/require explicit opt-in

Root Cause

I found multiple giant ACP session streams caused by this exact pattern, and the host I investigated also has journal evidence of repeated codex-acp OOM kills.

Code Example

rg -n "memory-lancedb|vector ready|provider: \"lancedb\"|plugin registered|auto-captured|injecting .* memories|bundled LanceDB runtime unavailable" /tmp/openclaw /root/.openclaw -g '!**/*.sqlite*' -g '!**/node_modules/**'

---

/root/.openclaw/agents/main/sessions

---

Mar 31 13:48:13 kernel: Out of memory: Killed process 3956731 (codex-acp) ... anon-rss:29226016kB
Mar 31 13:48:14 systemd[1463]: openclaw-gateway.service: Failed with result 'oom-kill'.

Mar 29 21:49:32 kernel: Out of memory: Killed process 1335063 (codex-acp) ... anon-rss:30094768kB
Mar 29 21:49:36 systemd[1463]: openclaw-gateway.service: Failed with result 'oom-kill'.

Mar 21 11:13:30 kernel: Out of memory: Killed process 357499 (codex-acp) ... anon-rss:30561300kB
Mar 21 11:13:31 systemd[1463]: openclaw-gateway.service: Failed with result 'oom-kill'.

---

systemd-journald: Under memory pressure, flushing caches.
RAW_BUFFERClick to expand / collapse

Bug type

Crash / resource-exhaustion bug

Summary

Codex ACP sessions can ingest enormous transcript payloads from OpenClaw's own archived session files, persist those payloads into ~/.acpx/sessions/*.stream*.ndjson, and then repeatedly replay/load the bloated session state until codex-acp consumes tens of GB of RSS and gets OOM-killed. Under enough pressure, this can destabilize or reboot the host.

This appears to be a session self-ingestion / replay-amplification failure mode, not just a generic memory leak.

The pattern I found is:

  1. A Codex ACP turn runs a broad recursive search over OpenClaw state, especially paths like /root/.openclaw or /root/.openclaw/agents/main/sessions
  2. That search matches archived OpenClaw transcript files containing prior tool results and large text blobs
  3. The huge search result is emitted as tool_call_update content and persisted into the ACP session stream under ~/.acpx/sessions/...stream*.ndjson
  4. Later session/load / replay of the same session forces codex-acp to reload or process an extremely large history
  5. codex-acp memory usage grows to tens of GB and is killed by the OOM killer; after repeated pressure, the whole machine may crash/reboot

I found multiple giant ACP session streams caused by this exact pattern, and the host I investigated also has journal evidence of repeated codex-acp OOM kills.

Steps to reproduce

I do not have a tiny synthetic repro yet, but this appears reproducible with a persistent Codex ACP session and a broad grep/rg over OpenClaw session history.

Probable repro shape:

  1. Run OpenClaw with ACP/Codex enabled
  2. Start a persistent Codex ACP session bound to a thread/topic
  3. From that session, run a broad recursive search that includes OpenClaw session archives, for example searching paths under:
    • /root/.openclaw
    • /root/.openclaw/agents/main/sessions
  4. Let that tool call return a large result set containing historical transcript/tool-result content
  5. Continue using the same ACP session or restart the gateway so the session is loaded again
  6. Observe ACP session artifacts grow rapidly and codex-acp memory climb

A stronger practical repro is to search for phrases known to appear in archived session history so that the search returns transcript blobs rather than just source hits.

Expected behavior

  • ACP sessions should not recursively ingest massive archived session/transcript payloads into their own durable history
  • Tool output should be aggressively bounded, summarized, truncated, or rejected when it comes from session archives/log stores
  • Re-loading a session should not replay enough historical content to drive codex-acp into runaway memory growth
  • A single bad search result should not be able to create a persistent OOM trap for future loads of the same session

Actual behavior

Observed on a live host:

  • Repeated journal evidence of codex-acp being OOM-killed inside openclaw-gateway.service
  • Severe pre-crash memory pressure (systemd-journald: Under memory pressure, flushing caches.)
  • Hard host reboot with no clean shutdown sequence
  • Several ACP sessions with abnormally large persistent stream logs
  • Very large tool_call_update payloads containing text from OpenClaw archived sessions

Representative on-disk evidence from ~/.acpx/sessions:

  • 019ceddb-8f50-7511-b52a-3d430ea6297f396261812 bytes of stream data
  • 019d4588-d5ea-7fe1-85cb-83af377caf39373457587 bytes of stream data
  • 019cf1d0-78e6-7810-bf88-ee56eba800ce123347001 bytes of stream data

Representative bad command pattern captured in ACP session history:

rg -n "memory-lancedb|vector ready|provider: \"lancedb\"|plugin registered|auto-captured|injecting .* memories|bundled LanceDB runtime unavailable" /tmp/openclaw /root/.openclaw -g '!**/*.sqlite*' -g '!**/node_modules/**'

That output included hits from archived OpenClaw session files like:

  • /root/.openclaw/agents/main/sessions/...topic-12088.jsonl...

and those results were persisted back into ACP session history as multi-megabyte tool_call_update chunks.

I also found another giant session caused by searching directly inside:

/root/.openclaw/agents/main/sessions

with similarly huge persisted output.

OpenClaw version

Observed on OpenClaw Gateway (v2026.3.28-beta.1)

Local source checkout used for investigation:

  • repo: openclaw/openclaw
  • commit checked out locally: fa339dbd92

Operating system

Ubuntu 25.10 (x86_64)

Kernel:

  • Linux netcup-clawd 6.17.0-14-generic

Install method

systemd user service, local source checkout / npm-style runtime

Model

Codex ACP (@zed-industries/codex-acp)

Provider / routing chain

Telegram / gateway -> OpenClaw ACP runtime (acpx) -> npx @zed-industries/codex-acp@^0.9.5 -> codex-acp

Representative live process chain:

  • openclaw-gateway
  • acpx ... codex prompt --session ...
  • acpx dist/cli.js __queue-owner
  • npm exec @zed-industries/codex-acp@^0.9.5
  • codex-acp

Config file / key location

No special config change seems required beyond using OpenClaw with ACP/Codex sessions.

The main issue seems to be broad searches over OpenClaw runtime/session storage paths combined with durable ACP session persistence.

Additional provider/model setup details

Representative queue-owner payload from a live session:

  • permissionMode: approve-all
  • nonInteractivePermissions: fail
  • ttlMs: 100
  • maxQueueDepth: 16
  • agent command: npx @zed-industries/codex-acp@^0.9.5

I do not think ttlMs=100 is the primary bug, but short queue-owner lifetimes may make session reload/replay churn more frequent.

Logs, screenshots, and evidence

Journal evidence of the same failure class on this host:

Mar 31 13:48:13 kernel: Out of memory: Killed process 3956731 (codex-acp) ... anon-rss:29226016kB
Mar 31 13:48:14 systemd[1463]: openclaw-gateway.service: Failed with result 'oom-kill'.

Mar 29 21:49:32 kernel: Out of memory: Killed process 1335063 (codex-acp) ... anon-rss:30094768kB
Mar 29 21:49:36 systemd[1463]: openclaw-gateway.service: Failed with result 'oom-kill'.

Mar 21 11:13:30 kernel: Out of memory: Killed process 357499 (codex-acp) ... anon-rss:30561300kB
Mar 21 11:13:31 systemd[1463]: openclaw-gateway.service: Failed with result 'oom-kill'.

Pre-reboot symptom tonight:

systemd-journald: Under memory pressure, flushing caches.

Then the machine rebooted without a clean shutdown sequence.

Representative giant persisted ACP output chunks were ~3.1 MB to ~3.4 MB each and were stored as session/update -> tool_call_update entries.

Representative huge session stats I measured:

  • 019d3af8-e0b2-78e2-a92f-0c70b575886e.stream.ndjson

    • 22,978,623 bytes
    • 25,918 lines
    • 24,315 agent_message_chunk updates
  • 019d4588-d5ea-7fe1-85cb-83af377caf39.stream*

    • 373,457,587 bytes total
    • contained repeated tool_call_update entries ~3.4 MB each
  • 019ceddb-8f50-7511-b52a-3d430ea6297f.stream*

    • 396,261,812 bytes total
    • contained repeated tool_call_update entries ~3.1 MB each

Impact and severity

High / potentially critical for hosts running persistent ACP sessions.

Impact:

  • codex-acp can reach ~29-30 GB RSS and be OOM-killed
  • openclaw-gateway.service can fail with oom-kill
  • the host can become unstable or reboot under sustained memory pressure
  • once a session is poisoned with giant persisted output, future loads may keep re-triggering the problem

This is especially dangerous on long-lived Telegram/ACP thread workflows where users may naturally search logs, sessions, or workspace state.

Additional information

Why I think this is distinct from the already-open orphan/zombie ACP issues:

  • Related but different issue: #44790 is about orphaned ACP child processes / swap exhaustion
  • Related but different issue: #48573 is about embedded-run zombie session state

This report is specifically about:

  • giant ACP durable session artifacts
  • self-ingestion of OpenClaw archived session output
  • replay/load amplification through session/load
  • codex-acp memory blow-up due to persisted transcript/tool output volume

Most likely fix areas:

  1. Never allow broad log/session archive results to be persisted verbatim into ACP durable session history beyond a strict cap
  2. Add hard truncation / summarization for tool_call_update payloads before they hit ~/.acpx/sessions
  3. Exclude OpenClaw session archives/log stores from broad ACP search defaults, or warn/require explicit opt-in
  4. Add defensive max-bytes / max-events limits on ACP session replay/load
  5. Detect pathological session artifacts and refuse to load them without compaction / repair

If useful, I can provide a follow-up issue comment with more exact session IDs, offending commands, and example raw snippets from the oversized tool_call_update records.

extent analysis

TL;DR

To fix the crash/resource-exhaustion bug, limit the size of tool_call_update payloads persisted in ACP session history and implement defensive measures against replay/load amplification.

Guidance

  1. Implement payload size limits: Enforce a strict cap on the size of tool_call_update payloads before they are persisted into ~/.acpx/sessions.
  2. Truncate or summarize large payloads: Automatically truncate or summarize large tool_call_update payloads to prevent them from causing memory blow-up.
  3. Exclude archived sessions from searches: Modify ACP search defaults to exclude OpenClaw session archives/log stores, or require explicit opt-in to search these areas.
  4. Add defensive limits on session replay/load: Introduce max-bytes or max-events limits on ACP session replay/load to prevent pathological session artifacts from causing issues.
  5. Detect and refuse pathological artifacts: Develop a mechanism to detect oversized session artifacts and refuse to load them without compaction or repair.

Example

A possible implementation could involve modifying the tool_call_update handling code to truncate payloads exceeding a certain size (e.g., 1 MB) before persisting them:

const maxSize = 1024 * 1024; // 1 MB
const payload = /* tool_call_update payload */;
if (payload.length > maxSize) {
  const truncatedPayload = payload.substring(0, maxSize);
  // Persist truncatedPayload instead of original payload
}

Notes

The provided solution focuses on limiting the size of persisted payloads and implementing defensive measures. However, a more comprehensive fix might require additional changes, such as optimizing ACP session management or improving memory handling in codex-acp.

Recommendation

Apply a workaround by implementing payload size limits and defensive measures against replay/load amplification. This approach can help mitigate the issue until a more comprehensive fix is available.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  • ACP sessions should not recursively ingest massive archived session/transcript payloads into their own durable history
  • Tool output should be aggressively bounded, summarized, truncated, or rejected when it comes from session archives/log stores
  • Re-loading a session should not replay enough historical content to drive codex-acp into runaway memory growth
  • A single bad search result should not be able to create a persistent OOM trap for future loads of the same session

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING