openclaw - ✅(Solved) Fix [Feature]: bundled openai-compatible embedding provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI) [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#80476Fetched 2026-05-11 03:14:14
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
2
Author
Timeline (top)
cross-referenced ×3commented ×1

Add a bundled memory embedding provider adapter named openai-compatible that targets any local OpenAI-compatible HTTP embedding server (llama.cpp's llama-server, Ollama via its /v1 surface, vLLM, TGI, LocalAI, llamafile, or any reverse-proxied internal instance), without any vendor-specific warmup probe and without inheriting from any global models.providers.* config.

Error Message

  • Fails-fast with a clear error message when embedding.baseUrl or embedding.model is missing. WARN liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu eventLoopDelayMaxMs=29091.7 WARN lmstudio embeddings warmup failed; continuing without preload

Root Cause

  • Provider id: openai-compatible. Matches the term llama.cpp, Ollama, vLLM, TGI, LocalAI, and llamafile all use to describe their HTTP API.
  • transport: "remote". Routed through the same SSRF + remote-fetch path as the cloud adapters.
  • No autoSelectPriority. Operator must opt in explicitly via embedding.provider: "openai-compatible". We do not want auto-selection, because every operator with another adapter's credentials configured would otherwise route embeddings to the cloud the moment they enabled memory-lancedb.
  • No authProviderId. There is no centralized auth flow for arbitrary local servers; the optional apiKey lives directly in the per-plugin embedding config block.
  • No warmup, preload, or model-load probe. The first /v1/embeddings call loads the model lazily, which every server in this family already does.
  • Reads only from the per-plugin embedding config block. Does not consult any global models.providers.* block. Cannot accidentally route to a vendor cloud.
  • Fails-fast with a clear error message when embedding.baseUrl or embedding.model is missing.

Fix Action

Fix / Workaround

  1. Document the existing workaround harder (set provider: "openai" plus embedding.baseUrl). Today the docs already mention this. The trap is silent: if the per-plugin baseUrl is removed during a config edit, traffic silently goes to api.openai.com. A safer adapter that fails-fast on missing baseUrl is preferable to documentation that depends on operator vigilance.

PR fix notes

PR #80479: feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)

Description (problem / solution / changelog)

Summary

  • Problem: operators running a self-hosted OpenAI-compatible embeddings server (llama.cpp's llama-server, Ollama via its /v1 surface, vLLM, TGI, LocalAI, llamafile, or any reverse-proxied internal instance) have no clean adapter for it. Pointing the bundled lmstudio adapter at it triggers an LMStudio-only "load model" warmup that hangs against generic servers and stalls the gateway event loop for ~30 seconds per memory-lancedb embedding-provider rebuild. Pointing the bundled openai adapter at it works, but inherits global OpenAI headers/attribution/api-key resolution, and a removed embedding.baseUrl line silently falls back to api.openai.com which leaks embedded text to the cloud.
  • Why it matters: the symptom is gateway freezes that show up as multi-second sessions.list backlogs and a flooded gateway log. Operators spend hours diagnosing what is actually a UX gap: the bundled adapters do not include a generic local-server option, and the existing in-process local adapter (node-llama-cpp on a .gguf file) does not cover operators who run their embeddings server as a separate HTTP process.
  • What changes: adds a new bundled extension extensions/openai-compatible-embeddings/ that registers an openai-compatible memory embedding provider. The adapter has no warmup, no global config inheritance, fails-fast on missing embedding.baseUrl/embedding.model, and does not auto-select (operator must explicitly opt in with embedding.provider: "openai-compatible").
  • What did NOT change (scope boundary): no existing adapter touched. lmstudio, openai, mistral, gemini, voyage, bedrock, deepinfra, ollama, in-process local adapters all behave byte-identically. The Plugin SDK surface is unchanged; the new adapter consumes the same public exports the other bundled adapters do. No protocol change, no schema change, no migration, no telemetry. The existing in-process local adapter stays as-is for operators who load .gguf files in-process via node-llama-cpp; the two adapters are complementary, not redundant.

Change Type

  • Feature
  • Docs

Scope

  • Memory / storage
  • Integrations

Linked Issue/PR

  • Closes #80476
  • Related #72875 (operator confusion: provider: "local" actually means in-process node-llama-cpp, not HTTP)
  • Related #72937 (open PR fixing #72875's registration timing for the in-process local adapter)
  • Related #66163 (closed: Unknown memory embedding provider: ollama, the precedent that led to the bundled ollama adapter; this PR follows the same pattern, generalized)
  • Related #74204 (open: memory.qmd.update.embedTimeoutMs too low for local GGUF; same operator profile)
  • Related #74761 (open: Document oMLX as a memorySearch embedding provider; oMLX exposes an OpenAI-compatible API and would just work through this adapter without further plugin code)
  • Related #60994 (closed: cannot reliably connect to remote Ollama / LM Studio via LAN; adjacent operator pain in the same ecosystem)
  • Related #42270 (closed: LM Studio backend regression; same lmstudio-adapter brittleness)
  • This PR adds a new feature

Real behavior proof

  • Behavior or issue addressed: an operator running llama-server (llama.cpp) with the BGE-M3 embedding model on http://localhost:8081/v1 had memory-lancedb captures triggering ~30-second event-loop stalls every time the embedding provider rebuilt, because the lmstudio adapter's ensureLmstudioModelLoaded warmup hangs against llama.cpp's OpenAI-compatible server (which does not expose LMStudio's load-model endpoint). The new openai-compatible adapter routes through the same generic createRemoteEmbeddingProvider factory the other adapters use, just without the warmup phase. Embeddings work end-to-end on the first call, no preload required.

  • Real environment tested: macOS 26.4.1 on Apple Silicon (arm64). llama-server from llama.cpp serving bge-m3-Q8_0.gguf (605 MB, 1024 dimensions) on http://127.0.0.1:8081, with --ngl 24 -c 32768 -np 4 -b 512 -ub 512 --mmap --mlock --cont-batching --api-key <set>. Live ~/.openclaw/ with memory-lancedb enabled.

  • Exact steps or command run after this patch: ran pnpm test extensions/openai-compatible-embeddings to validate the adapter posture and the no-warmup invariant. Then invoked the new factory directly through node --import tsx against the live llama-server, capturing the round-trip latency for both embedQuery and embedBatch. Independently verified the same llama-server endpoint with curl -H "Authorization: Bearer ..." http://localhost:8081/v1/embeddings returns 1024-dim vectors with the same model name.

  • Evidence after fix:

    Live invocation of the new adapter from a small Node script (node --import tsx):

    $ node --import tsx /tmp/proof-openai-compatible-embeddings.mjs
    [proof] target  : http://localhost:8081/v1
    [proof] model   : text-embedding-bge-m3
    [proof] apiKey  : <set>
    [proof] factory : 1ms (no warmup, just client construction)
    [proof] client  : baseUrl=http://localhost:8081/v1 model=text-embedding-bge-m3
    [proof] embed   : 124ms, dims=1024, head=[-0.0392, 0.0370, -0.0289, 0.0161, ...]
    [proof] batch   : 25ms, count=4, dims=1024
    [proof] OK. openai-compatible embeddings adapter wired end-to-end against llama.cpp.

    Notice the factory took 1 ms (the lmstudio adapter would have taken up to 120 s here against the same server), and the actual embedding round-trip is 124 ms with the expected 1024-dim BGE-M3 output.

    Independent confirmation of the same endpoint via curl, showing the local server answers OpenAI-shaped requests without any vendor-specific preamble:

    $ curl -sS -m 5 -H "Authorization: Bearer ..." -H "Content-Type: application/json" \
        -d '{"model":"text-embedding-bge-m3","input":"hello"}' \
        -w "\nHTTP %{http_code} time=%{time_total}s\n" \
        http://localhost:8081/v1/embeddings | tail -c 200
    ...,0.05932944267988205],"index":0,"object":"embedding"}]}
    HTTP 200 time=0.069077s

    Targeted regression test for the adapter posture and the no-warmup invariant:

    $ pnpm test extensions/openai-compatible-embeddings
     Test Files  1 passed (1)
          Tests  4 passed (4)
       Duration  4.52s
  • Observed result after fix: provider construction takes 1 ms (no warmup network call). The adapter holds the per-plugin baseUrl/model exactly as configured, with no fallback to any global config block. Embeddings round-trip in well under 200 ms against the live local server. Existing adapters (openai, lmstudio, mistral, gemini, voyage, bedrock, deepinfra, ollama, in-process local) are untouched.

  • What was not tested: did not run the new adapter inside an actual openclaw gateway process end-to-end, because the dist bundle does not include the new extension yet (the source-only invocation above is the closest equivalent without a release-tagged build). Did not run pnpm check:changed in Testbox; targeted pnpm test extensions/openai-compatible-embeddings plus targeted npx oxlint extensions/openai-compatible-embeddings/ plus targeted pnpm tsgo:prod are all clean on the touched files.

  • Before evidence: not applicable for a feature add. The "before" is "this provider did not exist," and the operator pain it addresses is documented in the Summary above and in linked issue #80476.

Root Cause

N/A. Feature addition, not a regression fix. (For the underlying operator pain that motivated the addition, see linked issue #80476.)

Regression Test Plan

  • Coverage level that should have caught this:
    • Unit test
  • Target test or file: extensions/openai-compatible-embeddings/memory-embedding-adapter.test.ts (new), with the factory in extensions/openai-compatible-embeddings/embedding-provider.ts.
  • Scenario the test should lock in:
    • The adapter's posture: id: "openai-compatible", transport: "remote", no autoSelectPriority, no authProviderId, allowExplicitWhenConfiguredAuto: true, no shouldContinueAutoSelection. This is what stops the adapter from accidentally being auto-selected over an unrelated cloud provider whose key happens to be configured.
    • No warmup or preload during create. The adapter must produce exactly one factory invocation per create call; nothing else.
    • The cache key includes the per-plugin baseUrl and model exactly as supplied, so two different local servers do not share a cache entry.
    • The Authorization header is stripped from the cache key so a rotated bearer does not invalidate cached embeddings.
  • Why this is the smallest reliable guardrail: the adapter is a thin facade over the generic createRemoteEmbeddingProvider. The risk surface is the posture (auto-select / auth / fallback) and the absence of any pre-call side effect. Both are testable in pure-TS with a mocked factory; no live server needed for the unit tests.
  • Existing test that already covers this (if any): no. No bundled adapter today behaves the way openai-compatible needs to (no auth provider, no auto-select, fully self-contained config, no vendor-specific warmup).

User-visible / Behavior Changes

Operators who configure embedding.provider: "openai-compatible" plus embedding.baseUrl and embedding.model under plugins.entries.memory-lancedb.config.embedding get a working embeddings flow against any OpenAI-compatible local server. No behavior change for any operator who has not opted in. Existing lmstudio/openai/local/etc. adapters keep doing exactly what they do today.

Diagram

Before:
  memory-lancedb capture
    -> embedding provider rebuild
    -> lmstudio adapter create()
       -> ensureLmstudioModelLoaded(timeoutMs: 120_000)
          -> POST <local-server>/api/v0/load-model  (LMStudio-only)
          -> server replies 404 / hangs / returns unexpected shape
          -> ~30s event-loop stall before failure log
    -> /v1/embeddings call finally fires (works)

After (with embedding.provider: "openai-compatible"):
  memory-lancedb capture
    -> embedding provider rebuild
    -> openai-compatible adapter create()
       -> client construction only (~1ms)
    -> /v1/embeddings call fires immediately

Security Impact

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No. The optional apiKey is a per-plugin config field, treated identically to existing adapters' apiKey handling. Cache key strips the Authorization header.
  • New/changed network calls? Only when explicitly configured by the operator. No call goes out at startup.
  • Command/tool execution surface changed? No
  • Data access scope changed? No. The adapter is fully self-contained and does not consult any global models.providers.* block, so it cannot leak embedded text to a cloud provider on a stale config.

Repro + Verification

Environment

  • OS: macOS 26.4.1 (arm64)
  • Runtime/container: Node 26.0.0
  • Model/provider: BGE-M3 (Q8_0 GGUF) via llama.cpp llama-server on localhost:8081
  • Integration/channel (if any): N/A (memory plugin)
  • Relevant config (redacted): plugins.entries.memory-lancedb.config.embedding.provider: "openai-compatible", baseUrl: "http://localhost:8081/v1", model: "text-embedding-bge-m3", apiKey: "<bearer>", dimensions: 1024

Steps

  1. Start any OpenAI-compatible local embedding server. For llama.cpp: llama-server -m <bge-m3.gguf> -a text-embedding-bge-m3 --embedding --host 127.0.0.1 --port 8081 --api-key <bearer>.
  2. In ~/.openclaw/openclaw.json set memory-lancedb's embedding block to provider: "openai-compatible" plus baseUrl, model, optional apiKey/headers.
  3. Restart the gateway. memory-lancedb captures and recalls now go through the local server with no warmup stall.

Expected

provider.embedQuery("hello") returns a 1024-dim vector in well under 200 ms. No event-loop stalls. No warmup warnings in the gateway log.

Actual

Matches expected. Verified end-to-end against llama.cpp serving BGE-M3 (terminal output included in Real behavior proof).

Evidence

  • Failing test/log before + passing after (terminal output in Real behavior proof above; the "before" is "this provider did not exist")
  • Trace/log snippets (proof script output, curl output, vitest output)

Human Verification

  • Verified scenarios: ran the new factory against live llama.cpp serving BGE-M3 on macOS 26.4.1 / Apple Silicon. Confirmed embedQuery and embedBatch return correct-dimensionality vectors. Confirmed factory construction completes in 1 ms with no network call (vs lmstudio adapter's ~30s warmup hang against the same server). Verified the cache key contains the per-plugin baseUrl and model, with Authorization stripped. Verified pnpm test extensions/openai-compatible-embeddings (4/4 pass), npx oxlint extensions/openai-compatible-embeddings/ (0 errors), and pnpm tsgo:prod (clean on touched files).
  • Edge cases checked: missing baseUrl throws a clear error rather than silently falling back. Missing model does the same. Adapter has no autoSelectPriority, so it cannot be picked automatically when the operator has another adapter's credentials configured. Headers passed through embedding.headers get attached to every request alongside the Authorization Bearer.
  • What you did not verify: did not run the new adapter inside an actual openclaw gateway process end-to-end (dist bundle does not include the new extension yet). Did not run pnpm check:changed in Testbox.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes. Pure addition.
  • Config/env changes? No required changes. Operators who want the new provider opt in by setting embedding.provider: "openai-compatible" and baseUrl/model.
  • Migration needed? No. Operators who currently work around the gap by pointing lmstudio or openai at a local server can switch when convenient. Their existing setup keeps working.

Risks and Mitigations

  • Risk: an operator confuses the new HTTP-based openai-compatible provider with the existing in-process local provider.
    • Mitigation: the docs example calls out the distinction explicitly, listing the deployment shapes each one targets. The provider id openai-compatible reads as "any server that speaks the OpenAI HTTP API," which is the term llama.cpp / Ollama / vLLM / TGI / LocalAI all use to describe themselves; the existing local id keeps the semantic of "local in-process model file."
  • Risk: an operator removes their embedding.baseUrl line by mistake while the openai-compatible provider is configured.
    • Mitigation: the adapter throws a clear error at create time pointing to the missing field. No fallback to a default URL.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • docs/plugins/memory-lancedb.md (modified, +47/-4)
  • extensions/openai-compatible-embeddings/embedding-provider.ts (added, +112/-0)
  • extensions/openai-compatible-embeddings/index.ts (added, +12/-0)
  • extensions/openai-compatible-embeddings/memory-embedding-adapter.test.ts (added, +100/-0)
  • extensions/openai-compatible-embeddings/memory-embedding-adapter.ts (added, +48/-0)
  • extensions/openai-compatible-embeddings/openclaw.plugin.json (added, +15/-0)
  • extensions/openai-compatible-embeddings/package.json (added, +15/-0)
  • extensions/openai-compatible-embeddings/tsconfig.json (added, +16/-0)
  • pnpm-lock.yaml (modified, +6/-0)
  • test/vitest/vitest.extension-openai-compatible-embeddings.config.ts (added, +9/-0)

Code Example

{
  plugins: {
    entries: {
      "memory-lancedb": {
        enabled: true,
        config: {
          embedding: {
            provider: "openai-compatible",
            baseUrl: "http://localhost:8081/v1",
            model: "text-embedding-bge-m3",
            apiKey: "${LLAMA_API_TOKEN}",
            dimensions: 1024,
          },
        },
      },
    },
  },
}

---

2026-05-11T05:05:50  ⇄ res ✓ sessions.list 22416ms
2026-05-11T05:05:50  ⇄ res ✓ config.get   61301ms
2026-05-11T05:05:50  ⇄ res ✓ config.get   59181ms
WARN  liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu  eventLoopDelayMaxMs=29091.7
WARN  lmstudio embeddings warmup failed; continuing without preload

---

[proof] target  : http://localhost:8081/v1
[proof] model   : text-embedding-bge-m3
[proof] factory : 1ms (no warmup, just client construction)
[proof] embed   : 124ms, dims=1024
[proof] batch   : 25ms, count=4, dims=1024
[proof] OK. openai-compatible embeddings adapter wired end-to-end against llama.cpp.
RAW_BUFFERClick to expand / collapse
<!-- Choose the "Feature request" template if you want the form fields. Otherwise paste this whole body and submit. -->

Summary

Add a bundled memory embedding provider adapter named openai-compatible that targets any local OpenAI-compatible HTTP embedding server (llama.cpp's llama-server, Ollama via its /v1 surface, vLLM, TGI, LocalAI, llamafile, or any reverse-proxied internal instance), without any vendor-specific warmup probe and without inheriting from any global models.providers.* config.

Problem to solve

Operators running a self-hosted OpenAI-compatible embeddings server today have two unsatisfying choices, both of which produce real operator pain.

  1. Point the bundled lmstudio adapter at the local server. The /v1/embeddings call works fine, but the adapter's ensureLmstudioModelLoaded warmup calls an LMStudio-only "load model" endpoint that hangs against generic servers. On my machine running llama.cpp's llama-server with BGE-M3 on localhost:8081, this hang blocks the gateway event loop for ~30 seconds per memory-lancedb embedding-provider rebuild. The gateway's own liveness diagnostic reports it as event_loop_delay = 29,091 ms, and queued sessions.list / config.get / cron.list responses balloon to 40-60 second response times during the freeze. The gateway log floods with lmstudio embeddings warmup failed; continuing without preload warnings with no operator-friendly indication that the actual cause is a vendor-specific preload endpoint mismatched against a perfectly good local server.

  2. Point the bundled openai adapter at the local server. This works (the per-plugin embedding.baseUrl overrides the global models.providers.openai.baseUrl, and the openai adapter has no warmup), but it inherits the global openai config block's headers, attribution, and api-key resolution. If the per-plugin embedding.baseUrl line ever gets removed by mistake during a config edit, embedding requests silently fall back to api.openai.com, leaking embedded text to a cloud provider the operator may not have intended for memory.

Neither option says what it is on the tin. Operators searching for "how do I use my local embedding server with openclaw" end up confused, sometimes filing followup issues like #72875 (Unknown memory embedding provider: local) thinking the existing local adapter is what they want, when in fact the existing local is for in-process node-llama-cpp on a .gguf file and not for HTTP-based local servers.

Proposed solution

Add a new bundled extension extensions/openai-compatible-embeddings/ that registers an openai-compatible memory embedding provider adapter.

Design:

  • Provider id: openai-compatible. Matches the term llama.cpp, Ollama, vLLM, TGI, LocalAI, and llamafile all use to describe their HTTP API.
  • transport: "remote". Routed through the same SSRF + remote-fetch path as the cloud adapters.
  • No autoSelectPriority. Operator must opt in explicitly via embedding.provider: "openai-compatible". We do not want auto-selection, because every operator with another adapter's credentials configured would otherwise route embeddings to the cloud the moment they enabled memory-lancedb.
  • No authProviderId. There is no centralized auth flow for arbitrary local servers; the optional apiKey lives directly in the per-plugin embedding config block.
  • No warmup, preload, or model-load probe. The first /v1/embeddings call loads the model lazily, which every server in this family already does.
  • Reads only from the per-plugin embedding config block. Does not consult any global models.providers.* block. Cannot accidentally route to a vendor cloud.
  • Fails-fast with a clear error message when embedding.baseUrl or embedding.model is missing.

Config:

{
  plugins: {
    entries: {
      "memory-lancedb": {
        enabled: true,
        config: {
          embedding: {
            provider: "openai-compatible",
            baseUrl: "http://localhost:8081/v1",
            model: "text-embedding-bge-m3",
            apiKey: "${LLAMA_API_TOKEN}",
            dimensions: 1024,
          },
        },
      },
    },
  },
}

Distinct from the existing in-process local adapter (extensions/memory-core/src/memory/provider-adapters.ts), which loads a .gguf file via node-llama-cpp inside the gateway process. See "Alternatives considered" below for the full breakdown of why the two are complementary rather than redundant.

Alternatives considered

Considered four other approaches.

  1. Use the existing local adapter (in-process node-llama-cpp). The natural first question. The existing local adapter loads a .gguf file directly into the gateway Node process via node-llama-cpp; my proposed adapter talks HTTP to a separately-running server. They are not interchangeable.

    Existing localProposed openai-compatible
    Where the model livesinside the gateway processseparate HTTP server
    Wirein-process Node bindingsHTTP /v1/embeddings
    Reload modelgateway restartserver restart only
    Share with other clientsno, gateway owns the modelyes, any HTTP client
    GPU tuning surfacenode-llama-cpp optionsthe server's own CLI flags (e.g. llama-server -ngl ...)
    Works with Ollama / vLLM / TGI / LocalAI / llamafileno (not Node libs)yes (they all speak OpenAI /v1)
    Operator's existing tuned setupmust be ported to node-llama-cpp optionsunchanged

    Operators running a separately-managed embedding server (which is the common shape on Apple Silicon, on machines with a dedicated GPU, or on shared infrastructure) cannot use the existing local adapter without abandoning their existing tuned setup. And operators on Ollama / vLLM / TGI / LocalAI / llamafile cannot use it at all because those projects are not Node libraries. Both adapters stay supported; they target different deployment shapes.

  2. Fix the lmstudio adapter so its warmup gracefully no-ops against non-LMStudio servers. Doable, but the operator is still using provider: "lmstudio" against an Ollama or llama.cpp server, which is semantically misleading and easy to mis-document. The fix lands the same wire behavior under a wrong name.

  3. Document the existing workaround harder (set provider: "openai" plus embedding.baseUrl). Today the docs already mention this. The trap is silent: if the per-plugin baseUrl is removed during a config edit, traffic silently goes to api.openai.com. A safer adapter that fails-fast on missing baseUrl is preferable to documentation that depends on operator vigilance.

  4. Run a small reverse-proxy in front of the local server that stubs the LMStudio-specific endpoints and forwards /v1/embeddings. Adds infra to a memory plugin, doesn't generalize across deployments, and still leaves the misleading provider: "lmstudio" in operator config.

The proposed bundled adapter is the simplest path that solves all the failure modes above: explicit name that matches what the upstream projects call themselves, no warmup, no global config inheritance, and complementary to the existing in-process local adapter without redundancy.

Impact

  • Affected: any operator running a self-hosted OpenAI-compatible embeddings server for memory-lancedb. The local-embeddings server ecosystem includes llama.cpp's llama-server, Ollama (via its /v1 surface), vLLM, TGI, LocalAI, and llamafile, all of which are popular alternatives to cloud embeddings for privacy, cost, or offline reasons. The user base overlaps heavily with operators of self-hosted openclaw stacks.
  • Severity: high for operators in the trap. The lmstudio-warmup-against-non-LMStudio path actively stalls the gateway for ~30 seconds per memory-lancedb embedding-provider rebuild, which fires roughly every 24-30 minutes of channel activity. The dashboard goes unresponsive during the freeze, queued WebSocket calls back up, and operators spend hours diagnosing what is actually a missing-adapter UX gap. The openai-with-baseUrl-override path is medium severity: works correctly until a config edit accidentally removes the override.
  • Frequency: every memory-lancedb embedding-provider rebuild for affected operators. On my machine (single openclaw instance, two channels, normal usage), the rebuild fires several times per hour.
  • Consequence: silent gateway stalls (lmstudio path), or silent leak of embedded chat content to a cloud provider (openai path). Both are operator-trust-eroding outcomes. The proposed adapter eliminates both.

Evidence / examples

Live evidence from my machine running this exact setup (llama.cpp llama-server serving bge-m3-Q8_0.gguf on http://localhost:8081/v1).

Before (with provider: "lmstudio" and the same baseUrl), gateway log during a single memory-lancedb embedding-provider rebuild:

2026-05-11T05:05:50  ⇄ res ✓ sessions.list 22416ms
2026-05-11T05:05:50  ⇄ res ✓ config.get   61301ms
2026-05-11T05:05:50  ⇄ res ✓ config.get   59181ms
WARN  liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu  eventLoopDelayMaxMs=29091.7
WARN  lmstudio embeddings warmup failed; continuing without preload

988 channels/imessage events queued behind the warmup hang in a 7-minute window when the issue compounded with another stall.

After (with my prototype openai-compatible adapter), live invocation against the same llama.cpp server:

[proof] target  : http://localhost:8081/v1
[proof] model   : text-embedding-bge-m3
[proof] factory : 1ms (no warmup, just client construction)
[proof] embed   : 124ms, dims=1024
[proof] batch   : 25ms, count=4, dims=1024
[proof] OK. openai-compatible embeddings adapter wired end-to-end against llama.cpp.

The factory: 1ms line is the key evidence. The lmstudio adapter takes up to 120s on the same input.

Prior art:

  • The existing ollama adapter (extensions/ollama/src/memory-embedding-adapter.ts) follows the same general shape: vendor-specific id, no warmup, self-contained config. It was added to fix the operator pain previously raised in #66163.
  • The proposed openai-compatible adapter is the same pattern, generalized for the broader local-server ecosystem rather than scoped to one vendor's native API.

Additional information

Backward compatible. Pure addition. Existing adapters (openai, lmstudio, mistral, gemini, voyage, bedrock, deepinfra, ollama, in-process local) all keep working unchanged. Operators currently working around the gap by misusing lmstudio or openai can switch to openai-compatible when convenient; their existing config keeps working in the meantime.

Accompanying PR drafted on branch feat/openai-compatible-embeddings-provider (will link the actual PR number once filed).

Related:

  • #72875 (open). provider: "local" fails with "Unknown memory embedding provider: local". Operators land here after misunderstanding which adapter to use for HTTP-based local servers; the existing local adapter is for in-process node-llama-cpp, not HTTP. The new openai-compatible adapter gives them the right name.
  • #72937 (open PR). fix for #72875's registration timing. Adjacent.
  • #66163 (closed). Unknown memory embedding provider: ollama, which led to the bundled ollama adapter. The proposed openai-compatible adapter follows the same pattern, generalized.
  • #74204 (open). memory.qmd.update.embedTimeoutMs too low for local GGUF. Same operator profile (running local embedding server), different timeout surface.
  • #74761 (open). Document oMLX (Apple Silicon MLX) as a memorySearch embedding provider. Same family of "add a local-server adapter" requests; oMLX exposes an OpenAI-compatible API and would work through the proposed openai-compatible adapter without further plugin code.
  • #60994 (closed). Cannot reliably connect to remote Ollama / LM Studio instances via LAN IP. Adjacent operator pain in the same ecosystem.
  • #42270 (closed). LM Studio backend regression. Related (lmstudio-adapter brittleness).

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING