openclaw - ✅(Solved) Fix Memory leak: qmd embed spawns 6 parallel agent embeds at boot, each loading 314MB GGUF model on CPU [1 pull requests, 4 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72144Fetched 2026-04-27 05:34:14
View on GitHub
Comments
4
Participants
3
Timeline
7
Reactions
0
Timeline (top)
commented ×4cross-referenced ×2closed ×1

Gateway suffers severe memory leak (2.7GB in 4 minutes after restart) caused by the qmd embed boot process spawning parallel embed runs for all agents, each loading a full GGUF embedding model into memory.

Error Message

Gateway process 4 minutes after boot

$ ps aux | grep openclaw-gateway RSS: 2711 MB, CPU: 103%

qmd embed log — repeats every ~3 minutes

[memory] qmd embed failed (boot): Error: qmd embed timed out after 120000ms; backing off for 60s

Manual embed (standalone, no parallel) — instant

$ time qmd embed --workspace ~/.openclaw/agents/main-controller ✓ All content hashes already have embeddings. real 0m0.664s

Root Cause

Gateway suffers severe memory leak (2.7GB in 4 minutes after restart) caused by the qmd embed boot process spawning parallel embed runs for all agents, each loading a full GGUF embedding model into memory.

Fix Action

Fix / Workaround

Workaround (current)

PR fix notes

PR #604: fix: lazy load node-llama-cpp for no-op embed

Description (problem / solution / changelog)

Summary

Avoid importing node-llama-cpp until QMD actually needs a local LLM/model operation.

This keeps no-op embed/status-style paths lightweight. In particular, store.embed() already checks for pending documents before calling into the LLM, but the previous static import meant the native node-llama-cpp module was still loaded as soon as llm.ts was imported.

This matters for OpenClaw multi-agent gateways where boot/update cycles may call qmd embed frequently and most runs often have no missing embeddings.

Changes

  • Convert node-llama-cpp value imports to type-only imports.
  • Add a memoized dynamic loader for node-llama-cpp.
  • Load getLlama, resolveModelFile, LlamaLogLevel, and LlamaChatSession only from code paths that actually need them.
  • Add a regression test proving a no-op embed does not import node-llama-cpp.

Related

This addresses part of openclaw/openclaw#72144: avoiding heavy native/LLM initialization when all content hashes already have embeddings.

Tests

Passed:

corepack pnpm exec vitest run test/llm-lazy-load.test.ts test/sdk.test.ts -t "lazy node-llama-cpp loading|store.embed rejects invalid batch limits|store.embed forwards batch limit options" --reporter=verbose

Also checked:

corepack pnpm exec tsc -p tsconfig.build.json --noEmit

That currently fails on main with an unrelated existing error:

src/store.ts(2142,22): error TS2339: Property 'transaction' does not exist on type 'Database'.

Changed files

  • src/llm.ts (modified, +19/-9)
  • test/llm-lazy-load.test.ts (added, +42/-0)

Code Example

# Gateway process 4 minutes after boot
$ ps aux | grep openclaw-gateway
RSS: 2711 MB, CPU: 103%

# qmd embed log — repeats every ~3 minutes
[memory] qmd embed failed (boot): Error: qmd embed timed out after 120000ms; backing off for 60s

# Manual embed (standalone, no parallel) — instant
$ time qmd embed --workspace ~/.openclaw/agents/main-controller
All content hashes already have embeddings.
real    0m0.664s

---

// openclaw.json
{
  "memory": {
    "qmd": {
      "update": {
        "onBoot": false,
        "embedInterval": "4h"
      }
    }
  }
}
RAW_BUFFERClick to expand / collapse

Summary

Gateway suffers severe memory leak (2.7GB in 4 minutes after restart) caused by the qmd embed boot process spawning parallel embed runs for all agents, each loading a full GGUF embedding model into memory.

Environment

  • OpenClaw: 2026.4.11 → 2026.4.23
  • OS: WSL2 Linux (x64), no GPU
  • Agents: 6 (main-controller, code-specialist, data-processor, investment-analyst, kb-researcher, office-assistant)
  • Embedding model: hf_ggml-org_embeddinggemma-300M-Q8_0.gguf (314MB per copy)
  • Total memory: 16GB system

Problem

1. No skip when embeddings are already up-to-date

When all content hashes already have embeddings (confirmed by running qmd embed manually — completes in 0.66s), the Gateway boot embed still:

  • Loads the 314MB GGUF model
  • Initializes the embedding pipeline
  • Only then discovers nothing needs to be done

This is a waste of both time and memory on every boot.

2. All agents embed in parallel

The Gateway starts embed for all 6 agents simultaneously at boot. Each one:

  • Loads a separate 314MB GGUF model into memory (1.88GB total for models alone)
  • Competes for CPU (no GPU acceleration)
  • Often times out (120s default) because all 6 are fighting for resources

The timeouts trigger a retry loop: timeout → backoff 60s → retry → timeout → ... This loop leaks ~100-200MB per cycle in the Gateway process (RSS grows from 1.3GB to 2.7GB in ~15 minutes and keeps climbing).

3. Duplicate GGUF model files

Each agent stores its own copy of the GGUF model at ~/.openclaw/agents/<agent>/qmd/xdg-cache/qmd/models/hf_ggml-org_embeddinggemma-300M-Q8_0.gguf. For 6 agents, that is 6 × 314MB = 1.88GB of duplicate files.

Evidence

# Gateway process 4 minutes after boot
$ ps aux | grep openclaw-gateway
RSS: 2711 MB, CPU: 103%

# qmd embed log — repeats every ~3 minutes
[memory] qmd embed failed (boot): Error: qmd embed timed out after 120000ms; backing off for 60s

# Manual embed (standalone, no parallel) — instant
$ time qmd embed --workspace ~/.openclaw/agents/main-controller
✓ All content hashes already have embeddings.
real    0m0.664s

Suggested Fixes

  1. Pre-check before loading model: Before spawning the embed process, check if any content hashes need embedding. If all are current, skip entirely (return immediately like the manual run does).

  2. Serialize agent embeds: Process agents sequentially (or with limited concurrency, e.g., 1-2 at a time) instead of launching all 6 in parallel. This reduces peak memory from ~1.88GB to ~314MB.

  3. Share the GGUF model file: Store one copy of the model in a shared location and symlink or reference it from each agent's qmd cache. (We worked around this manually with symlinks to ~/.openclaw/shared/qmd-models/.)

  4. Fix memory leak on timeout: When embed times out and the child process is killed, ensure all allocated memory (buffers, model weights) is fully released back to the OS, not retained in the Gateway process.

Workaround (current)

// openclaw.json
{
  "memory": {
    "qmd": {
      "update": {
        "onBoot": false,
        "embedInterval": "4h"
      }
    }
  }
}

This disables boot-time embed and runs it every 4 hours instead. Memory stays stable at ~900MB instead of growing to 2.7GB+.

Related Files

  • dist/qmd-manager-*.jsrunUpdate(), shouldRunEmbed(), withQmdEmbedLock()
  • dist/dreaming-B8_cmGiF.jsDEFAULT_MEMORY_DREAMING_FREQUENCY

extent analysis

TL;DR

Implementing a pre-check to skip loading the model when all content hashes are up-to-date and serializing agent embeds can significantly reduce memory usage and prevent the leak.

Guidance

  • Pre-check before loading model: Modify the shouldRunEmbed() function in qmd-manager-*.js to check if any content hashes need embedding before spawning the embed process.
  • Serialize agent embeds: Update the runUpdate() function to process agents sequentially or with limited concurrency, reducing peak memory usage.
  • Verify memory leak fix: After implementing the above changes, monitor the Gateway process's memory usage to ensure it no longer grows indefinitely.
  • Consider sharing the GGUF model file: Store a single copy of the model in a shared location and symlink or reference it from each agent's qmd cache to reduce disk space usage.

Example

No code example is provided as the necessary changes depend on the specific implementation details of the qmd-manager-*.js and dreaming-B8_cmGiF.js files.

Notes

The provided workaround in openclaw.json can be used as a temporary solution to prevent the memory leak, but it may not be desirable to disable boot-time embed entirely.

Recommendation

Apply the suggested fixes, specifically implementing a pre-check and serializing agent embeds, to address the memory leak and reduce peak memory usage. This approach directly targets the root causes of the issue, providing a more comprehensive solution than the current workaround.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Memory leak: qmd embed spawns 6 parallel agent embeds at boot, each loading 314MB GGUF model on CPU [1 pull requests, 4 comments, 3 participants]