openclaw - ✅(Solved) Fix Memory leak: qmd embed spawns 6 parallel agent embeds at boot, each loading 314MB GGUF model on CPU [1 pull requests, 4 comments, 3 participants]

Galaxy-Chen · 2026-04-26T11:55:45Z

[openclaw] Gateway suffers severe memory leak 2.7GB in 4 minutes after restart caused by the qmd embed boot process spawning parallel embed runs for all agents… Gateway suffers severe memory leak (2.7GB in 4 minutes after restart) caused by the `qmd embed` boot process spawning parallel embed runs for all agents, each loading a full GGUF embedding model into memory. # PR #604: fix: lazy load node-llama-cpp for no-op embed - Repository: tobi/qmd - Author: aceclaw826 - State: open | merged: False - Link: https://github.com/tobi/qmd/pull/604 ## Description (problem / solution / changelog) ## Summary Avoid importing `node-llama-cpp` until QMD actually needs a local LLM/model operation. This keeps no-op embed/status-style paths lightweight. In particular, `store.embed()` already checks for pending documents before calling into the LLM, but the previous static import meant the native `node-llama-cpp` module was still loaded as soon as `llm.ts` was imported. This matters for OpenClaw multi-agent gateways where boot/update cycles may call `qmd embed` frequently and most runs often have no missing embeddings. ## Changes - Convert `node-llama-cpp` value imports to type-only imports. - Add a memoized dynamic loader for `node-llama-cpp`. - Load `getLlama`, `resolveModelFile`, `LlamaLogLevel`, and `LlamaChatSession` only from code paths that actually need them. - Add a regression test proving a no-op embed does not import `node-llama-cpp`. ## Related This addresses part of openclaw/openclaw#72144: avoiding heavy native/LLM initialization when all content hashes already have embeddings. ## Tests Passed: ```bash corepack pnpm exec vitest run test/llm-lazy-load.test.ts test/sdk.test.ts -t "lazy node-llama-cpp loading|store.embed rejects invalid batch limits|store.embed forwards batch limit options" --reporter=verbose ``` Also checked: ```bash corepack pnpm exec tsc -p tsconfig.build.json --noEmit ``` That currently fails on main with an unrelated existing error: ```text src/store.ts(2142,22): error TS2339: Property 'transaction' does not exist on type 'Database'. ``` ## Changed files - `src/llm.ts` (modified, +19/-9) - `test/llm-lazy-load.test.ts` (added, +42/-0) ## Fix / Workaround ## Workaround (current) ## Summary Gateway suffers severe memory leak (2.7GB in 4 minutes after restart) caused by the `qmd embed` boot process spawning parallel embed runs for all agents, each loading a full GGUF embedding model into memory. ## Environment - OpenClaw: 2026.4.11 → 2026.4.23 - OS: WSL2 Linux (x64), no GPU - Agents: 6 (main-controller, code-specialist, data-processor, investment-analyst, kb-researcher, office-assistant) - Embedding model: `hf_ggml-org_embeddinggemma-300M-Q8_0.gguf` (314MB per copy) - Total memory: 16GB system ## Problem ### 1. No skip when embeddings are already up-to-date When all content hashes already have embeddings (confirmed by running `qmd embed` manually — completes in 0.66s), the Gateway boot embed still: - Loads the 314MB GGUF model - Initializes the embedding pipeline - Only then discovers nothing needs to be done This is a waste of both time and memory on every boot. ### 2. All agents embed in parallel The Gateway starts embed for all 6 agents simultaneously at boot. Each one: - Loads a separate 314MB GGUF model into memory (1.88GB total for models alone) - Competes for CPU (no GPU acceleration) - Often times out (120s default) because all 6 are fighting for resources The timeouts trigger a retry loop: timeout → backoff 60s → retry → timeout → ... This loop leaks ~100-200MB per cycle in the Gateway process (RSS grows from 1.3GB to 2.7GB in ~15 minutes and keeps climbing). ### 3. Duplicate GGUF model files Each agent stores its own copy of the GGUF model at `~/.openclaw/agents/ /qmd/xdg-cache/qmd/models/hf_ggml-org_embeddinggemma-300M-Q8_0.gguf`. For 6 agents, that is 6 × 314MB = 1.88GB of duplicate files. ## Evidence ``` # Gateway process 4 minutes after boot $ ps aux | grep openclaw-gateway RSS: 2711 MB, CPU: 103% # qmd embed log — repeats every ~3 minutes [memory] qmd embed failed (boot): Error: qmd embed timed out after 120000ms; backing off for 60s # Manual embed (standalone, no parallel) — instant $ time qmd embed --workspace ~/.openclaw/agents/main-controller ✓ All content hashes already have embeddings. real 0m0.664s ``` ## Suggested Fixes 1. **Pre-check before loading model**: Before spawning the embed process, check if any content hashes need embedding. If all are current, skip entirely (return immediately like the manual run does). 2. **Serialize agent embeds**: Process agents sequentially (or with limited concurrency, e.g., 1-2 at a time) instead of launching all 6 in parallel. This reduces peak memory from ~1.88GB to ~314MB. 3. **Share the GGUF model file**: Store one copy of the model in a shared location and symlink or reference it from each agent's qmd cache. (We worked around this manually with symlinks to `~/.openclaw/shared/qmd-models/`.) 4. **Fix memory leak on timeout**: When

openclaw2026-04-26 11:55:45

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#72144•Fetched 2026-04-27 05:34:14

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×4cross-referenced ×2closed ×1

Gateway suffers severe memory leak (2.7GB in 4 minutes after restart) caused by the qmd embed boot process spawning parallel embed runs for all agents, each loading a full GGUF embedding model into memory.

Error Message

Gateway process 4 minutes after boot

$ ps aux | grep openclaw-gateway RSS: 2711 MB, CPU: 103%

qmd embed log — repeats every ~3 minutes

[memory] qmd embed failed (boot): Error: qmd embed timed out after 120000ms; backing off for 60s

Manual embed (standalone, no parallel) — instant

$ time qmd embed --workspace ~/.openclaw/agents/main-controller ✓ All content hashes already have embeddings. real 0m0.664s

Root Cause

Fix Action

Fix / Workaround

Workaround (current)

PR fix notes

PR #604: fix: lazy load node-llama-cpp for no-op embed

Repository: tobi/qmd
Author: aceclaw826
State: open | merged: False
Link: https://github.com/tobi/qmd/pull/604

Description (problem / solution / changelog)

Summary

Avoid importing node-llama-cpp until QMD actually needs a local LLM/model operation.

This keeps no-op embed/status-style paths lightweight. In particular, store.embed() already checks for pending documents before calling into the LLM, but the previous static import meant the native node-llama-cpp module was still loaded as soon as llm.ts was imported.

This matters for OpenClaw multi-agent gateways where boot/update cycles may call qmd embed frequently and most runs often have no missing embeddings.

Changes

Convert node-llama-cpp value imports to type-only imports.
Add a memoized dynamic loader for node-llama-cpp.
Load getLlama, resolveModelFile, LlamaLogLevel, and LlamaChatSession only from code paths that actually need them.
Add a regression test proving a no-op embed does not import node-llama-cpp.

This addresses part of openclaw/openclaw#72144: avoiding heavy native/LLM initialization when all content hashes already have embeddings.

Tests

Passed:

corepack pnpm exec vitest run test/llm-lazy-load.test.ts test/sdk.test.ts -t "lazy node-llama-cpp loading|store.embed rejects invalid batch limits|store.embed forwards batch limit options" --reporter=verbose

Also checked:

corepack pnpm exec tsc -p tsconfig.build.json --noEmit

That currently fails on main with an unrelated existing error:

src/store.ts(2142,22): error TS2339: Property 'transaction' does not exist on type 'Database'.

Changed files

src/llm.ts (modified, +19/-9)
test/llm-lazy-load.test.ts (added, +42/-0)

Code Example

# Gateway process 4 minutes after boot
$ ps aux | grep openclaw-gateway
RSS: 2711 MB, CPU: 103%

# qmd embed log — repeats every ~3 minutes
[memory] qmd embed failed (boot): Error: qmd embed timed out after 120000ms; backing off for 60s

# Manual embed (standalone, no parallel) — instant
$ time qmd embed --workspace ~/.openclaw/agents/main-controller
✓ All content hashes already have embeddings.
real    0m0.664s

---

// openclaw.json
{
  "memory": {
    "qmd": {
      "update": {
        "onBoot": false,
        "embedInterval": "4h"
      }
    }
  }
}

RAW_BUFFERClick to expand / collapse

Summary

Environment

OpenClaw: 2026.4.11 → 2026.4.23
OS: WSL2 Linux (x64), no GPU
Agents: 6 (main-controller, code-specialist, data-processor, investment-analyst, kb-researcher, office-assistant)
Embedding model: hf_ggml-org_embeddinggemma-300M-Q8_0.gguf (314MB per copy)
Total memory: 16GB system

Problem

1. No skip when embeddings are already up-to-date

When all content hashes already have embeddings (confirmed by running qmd embed manually — completes in 0.66s), the Gateway boot embed still:

Loads the 314MB GGUF model
Initializes the embedding pipeline
Only then discovers nothing needs to be done

This is a waste of both time and memory on every boot.

2. All agents embed in parallel

The Gateway starts embed for all 6 agents simultaneously at boot. Each one:

Loads a separate 314MB GGUF model into memory (1.88GB total for models alone)
Competes for CPU (no GPU acceleration)
Often times out (120s default) because all 6 are fighting for resources

The timeouts trigger a retry loop: timeout → backoff 60s → retry → timeout → ... This loop leaks ~100-200MB per cycle in the Gateway process (RSS grows from 1.3GB to 2.7GB in ~15 minutes and keeps climbing).

3. Duplicate GGUF model files

Each agent stores its own copy of the GGUF model at ~/.openclaw/agents/<agent>/qmd/xdg-cache/qmd/models/hf_ggml-org_embeddinggemma-300M-Q8_0.gguf. For 6 agents, that is 6 × 314MB = 1.88GB of duplicate files.

Evidence

# Gateway process 4 minutes after boot
$ ps aux | grep openclaw-gateway
RSS: 2711 MB, CPU: 103%

# qmd embed log — repeats every ~3 minutes
[memory] qmd embed failed (boot): Error: qmd embed timed out after 120000ms; backing off for 60s

# Manual embed (standalone, no parallel) — instant
$ time qmd embed --workspace ~/.openclaw/agents/main-controller
✓ All content hashes already have embeddings.
real    0m0.664s

Suggested Fixes

Pre-check before loading model: Before spawning the embed process, check if any content hashes need embedding. If all are current, skip entirely (return immediately like the manual run does).
Serialize agent embeds: Process agents sequentially (or with limited concurrency, e.g., 1-2 at a time) instead of launching all 6 in parallel. This reduces peak memory from ~1.88GB to ~314MB.
Share the GGUF model file: Store one copy of the model in a shared location and symlink or reference it from each agent's qmd cache. (We worked around this manually with symlinks to ~/.openclaw/shared/qmd-models/.)
Fix memory leak on timeout: When embed times out and the child process is killed, ensure all allocated memory (buffers, model weights) is fully released back to the OS, not retained in the Gateway process.

Workaround (current)

// openclaw.json
{
  "memory": {
    "qmd": {
      "update": {
        "onBoot": false,
        "embedInterval": "4h"
      }
    }
  }
}

This disables boot-time embed and runs it every 4 hours instead. Memory stays stable at ~900MB instead of growing to 2.7GB+.

Related Files

dist/qmd-manager-*.js — runUpdate(), shouldRunEmbed(), withQmdEmbedLock()
dist/dreaming-B8_cmGiF.js — DEFAULT_MEMORY_DREAMING_FREQUENCY

extent analysis

TL;DR

Implementing a pre-check to skip loading the model when all content hashes are up-to-date and serializing agent embeds can significantly reduce memory usage and prevent the leak.

Guidance

Pre-check before loading model: Modify the shouldRunEmbed() function in qmd-manager-*.js to check if any content hashes need embedding before spawning the embed process.
Serialize agent embeds: Update the runUpdate() function to process agents sequentially or with limited concurrency, reducing peak memory usage.
Verify memory leak fix: After implementing the above changes, monitor the Gateway process's memory usage to ensure it no longer grows indefinitely.
Consider sharing the GGUF model file: Store a single copy of the model in a shared location and symlink or reference it from each agent's qmd cache to reduce disk space usage.

Example

No code example is provided as the necessary changes depend on the specific implementation details of the qmd-manager-*.js and dreaming-B8_cmGiF.js files.

Notes

The provided workaround in openclaw.json can be used as a temporary solution to prevent the memory leak, but it may not be desirable to disable boot-time embed entirely.

Recommendation

Apply the suggested fixes, specifically implementing a pre-check and serializing agent embeds, to address the memory leak and reduce peak memory usage. This approach directly targets the root causes of the issue, providing a more comprehensive solution than the current workaround.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#conversation history #tool integration #LLM response #prompt template #memory leak

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Memory leak: qmd embed spawns 6 parallel agent embeds at boot, each loading 314MB GGUF model on CPU [1 pull requests, 4 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Gateway process 4 minutes after boot

qmd embed log — repeats every ~3 minutes

Manual embed (standalone, no parallel) — instant

Root Cause

Fix Action

Fix / Workaround

Workaround (current)

PR fix notes

PR #604: fix: lazy load node-llama-cpp for no-op embed

Description (problem / solution / changelog)

Summary

Changes

Related

Tests

Changed files

Code Example

Summary

Environment

Problem

1. No skip when embeddings are already up-to-date

2. All agents embed in parallel

3. Duplicate GGUF model files

Evidence

Suggested Fixes

Workaround (current)

Related Files

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING