openclaw - ✅(Solved) Fix [Feature]: bundled openai-compatible embedding provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI) [1 pull requests, 1 comments, 2 participants]

openclaw2026-05-11 00:50:43

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#80476•Fetched 2026-05-11 03:14:14

View on GitHub

Comments

Participants

Timeline

Reactions

Author

yaanfpv

Participants

clawsweeper[bot]

yaanfpv

Timeline (top)

cross-referenced ×3commented ×1

Add a bundled memory embedding provider adapter named openai-compatible that targets any local OpenAI-compatible HTTP embedding server (llama.cpp's llama-server, Ollama via its /v1 surface, vLLM, TGI, LocalAI, llamafile, or any reverse-proxied internal instance), without any vendor-specific warmup probe and without inheriting from any global models.providers.* config.

Error Message

Fails-fast with a clear error message when embedding.baseUrl or embedding.model is missing. WARN liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu eventLoopDelayMaxMs=29091.7 WARN lmstudio embeddings warmup failed; continuing without preload

Root Cause

Provider id: openai-compatible. Matches the term llama.cpp, Ollama, vLLM, TGI, LocalAI, and llamafile all use to describe their HTTP API.
transport: "remote". Routed through the same SSRF + remote-fetch path as the cloud adapters.
No autoSelectPriority. Operator must opt in explicitly via embedding.provider: "openai-compatible". We do not want auto-selection, because every operator with another adapter's credentials configured would otherwise route embeddings to the cloud the moment they enabled memory-lancedb.
No authProviderId. There is no centralized auth flow for arbitrary local servers; the optional apiKey lives directly in the per-plugin embedding config block.
No warmup, preload, or model-load probe. The first /v1/embeddings call loads the model lazily, which every server in this family already does.
Reads only from the per-plugin embedding config block. Does not consult any global models.providers.* block. Cannot accidentally route to a vendor cloud.
Fails-fast with a clear error message when embedding.baseUrl or embedding.model is missing.

Fix Action

Fix / Workaround

Document the existing workaround harder (set provider: "openai" plus embedding.baseUrl). Today the docs already mention this. The trap is silent: if the per-plugin baseUrl is removed during a config edit, traffic silently goes to api.openai.com. A safer adapter that fails-fast on missing baseUrl is preferable to documentation that depends on operator vigilance.

PR fix notes

PR #80479: feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)

Repository: openclaw/openclaw
Author: yaanfpv
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/80479

Description (problem / solution / changelog)

Summary

Problem: operators running a self-hosted OpenAI-compatible embeddings server (llama.cpp's llama-server, Ollama via its /v1 surface, vLLM, TGI, LocalAI, llamafile, or any reverse-proxied internal instance) have no clean adapter for it. Pointing the bundled lmstudio adapter at it triggers an LMStudio-only "load model" warmup that hangs against generic servers and stalls the gateway event loop for ~30 seconds per memory-lancedb embedding-provider rebuild. Pointing the bundled openai adapter at it works, but inherits global OpenAI headers/attribution/api-key resolution, and a removed embedding.baseUrl line silently falls back to api.openai.com which leaks embedded text to the cloud.
Why it matters: the symptom is gateway freezes that show up as multi-second sessions.list backlogs and a flooded gateway log. Operators spend hours diagnosing what is actually a UX gap: the bundled adapters do not include a generic local-server option, and the existing in-process local adapter (node-llama-cpp on a .gguf file) does not cover operators who run their embeddings server as a separate HTTP process.
What changes: adds a new bundled extension extensions/openai-compatible-embeddings/ that registers an openai-compatible memory embedding provider. The adapter has no warmup, no global config inheritance, fails-fast on missing embedding.baseUrl/embedding.model, and does not auto-select (operator must explicitly opt in with embedding.provider: "openai-compatible").
What did NOT change (scope boundary): no existing adapter touched. lmstudio, openai, mistral, gemini, voyage, bedrock, deepinfra, ollama, in-process local adapters all behave byte-identically. The Plugin SDK surface is unchanged; the new adapter consumes the same public exports the other bundled adapters do. No protocol change, no schema change, no migration, no telemetry. The existing in-process local adapter stays as-is for operators who load .gguf files in-process via node-llama-cpp; the two adapters are complementary, not redundant.

Change Type

Feature
Docs

Scope

Memory / storage
Integrations

Linked Issue/PR

Closes #80476
Related #72875 (operator confusion: provider: "local" actually means in-process node-llama-cpp, not HTTP)
Related #72937 (open PR fixing #72875's registration timing for the in-process local adapter)
Related #66163 (closed: Unknown memory embedding provider: ollama, the precedent that led to the bundled ollama adapter; this PR follows the same pattern, generalized)
Related #74204 (open: memory.qmd.update.embedTimeoutMs too low for local GGUF; same operator profile)
Related #74761 (open: Document oMLX as a memorySearch embedding provider; oMLX exposes an OpenAI-compatible API and would just work through this adapter without further plugin code)
Related #60994 (closed: cannot reliably connect to remote Ollama / LM Studio via LAN; adjacent operator pain in the same ecosystem)
Related #42270 (closed: LM Studio backend regression; same lmstudio-adapter brittleness)
This PR adds a new feature

Real behavior proof

Behavior or issue addressed: an operator running llama-server (llama.cpp) with the BGE-M3 embedding model on http://localhost:8081/v1 had memory-lancedb captures triggering ~30-second event-loop stalls every time the embedding provider rebuilt, because the lmstudio adapter's ensureLmstudioModelLoaded warmup hangs against llama.cpp's OpenAI-compatible server (which does not expose LMStudio's load-model endpoint). The new openai-compatible adapter routes through the same generic createRemoteEmbeddingProvider factory the other adapters use, just without the warmup phase. Embeddings work end-to-end on the first call, no preload required.
Real environment tested: macOS 26.4.1 on Apple Silicon (arm64). llama-server from llama.cpp serving bge-m3-Q8_0.gguf (605 MB, 1024 dimensions) on http://127.0.0.1:8081, with --ngl 24 -c 32768 -np 4 -b 512 -ub 512 --mmap --mlock --cont-batching --api-key <set>. Live ~/.openclaw/ with memory-lancedb enabled.
Exact steps or command run after this patch: ran pnpm test extensions/openai-compatible-embeddings to validate the adapter posture and the no-warmup invariant. Then invoked the new factory directly through node --import tsx against the live llama-server, capturing the round-trip latency for both embedQuery and embedBatch. Independently verified the same llama-server endpoint with curl -H "Authorization: Bearer ..." http://localhost:8081/v1/embeddings returns 1024-dim vectors with the same model name.

Evidence after fix:

Live invocation of the new adapter from a small Node script (node --import tsx):

$ node --import tsx /tmp/proof-openai-compatible-embeddings.mjs
[proof] target  : http://localhost:8081/v1
[proof] model   : text-embedding-bge-m3
[proof] apiKey  : <set>
[proof] factory : 1ms (no warmup, just client construction)
[proof] client  : baseUrl=http://localhost:8081/v1 model=text-embedding-bge-m3
[proof] embed   : 124ms, dims=1024, head=[-0.0392, 0.0370, -0.0289, 0.0161, ...]
[proof] batch   : 25ms, count=4, dims=1024
[proof] OK. openai-compatible embeddings adapter wired end-to-end against llama.cpp.

Notice the factory took 1 ms (the lmstudio adapter would have taken up to 120 s here against the same server), and the actual embedding round-trip is 124 ms with the expected 1024-dim BGE-M3 output.

Independent confirmation of the same endpoint via curl, showing the local server answers OpenAI-shaped requests without any vendor-specific preamble:

$ curl -sS -m 5 -H "Authorization: Bearer ..." -H "Content-Type: application/json" \
    -d '{"model":"text-embedding-bge-m3","input":"hello"}' \
    -w "\nHTTP %{http_code} time=%{time_total}s\n" \
    http://localhost:8081/v1/embeddings | tail -c 200
...,0.05932944267988205],"index":0,"object":"embedding"}]}
HTTP 200 time=0.069077s

Targeted regression test for the adapter posture and the no-warmup invariant:

$ pnpm test extensions/openai-compatible-embeddings
 Test Files  1 passed (1)
      Tests  4 passed (4)
   Duration  4.52s

Observed result after fix: provider construction takes 1 ms (no warmup network call). The adapter holds the per-plugin baseUrl/model exactly as configured, with no fallback to any global config block. Embeddings round-trip in well under 200 ms against the live local server. Existing adapters (openai, lmstudio, mistral, gemini, voyage, bedrock, deepinfra, ollama, in-process local) are untouched.
What was not tested: did not run the new adapter inside an actual openclaw gateway process end-to-end, because the dist bundle does not include the new extension yet (the source-only invocation above is the closest equivalent without a release-tagged build). Did not run pnpm check:changed in Testbox; targeted pnpm test extensions/openai-compatible-embeddings plus targeted npx oxlint extensions/openai-compatible-embeddings/ plus targeted pnpm tsgo:prod are all clean on the touched files.
Before evidence: not applicable for a feature add. The "before" is "this provider did not exist," and the operator pain it addresses is documented in the Summary above and in linked issue #80476.

Root Cause

N/A. Feature addition, not a regression fix. (For the underlying operator pain that motivated the addition, see linked issue #80476.)

Regression Test Plan

Coverage level that should have caught this:
- Unit test
Target test or file: extensions/openai-compatible-embeddings/memory-embedding-adapter.test.ts (new), with the factory in extensions/openai-compatible-embeddings/embedding-provider.ts.
Scenario the test should lock in:
- The adapter's posture: id: "openai-compatible", transport: "remote", no autoSelectPriority, no authProviderId, allowExplicitWhenConfiguredAuto: true, no shouldContinueAutoSelection. This is what stops the adapter from accidentally being auto-selected over an unrelated cloud provider whose key happens to be configured.
- No warmup or preload during create. The adapter must produce exactly one factory invocation per create call; nothing else.
- The cache key includes the per-plugin baseUrl and model exactly as supplied, so two different local servers do not share a cache entry.
- The Authorization header is stripped from the cache key so a rotated bearer does not invalidate cached embeddings.
Why this is the smallest reliable guardrail: the adapter is a thin facade over the generic createRemoteEmbeddingProvider. The risk surface is the posture (auto-select / auth / fallback) and the absence of any pre-call side effect. Both are testable in pure-TS with a mocked factory; no live server needed for the unit tests.
Existing test that already covers this (if any): no. No bundled adapter today behaves the way openai-compatible needs to (no auth provider, no auto-select, fully self-contained config, no vendor-specific warmup).

User-visible / Behavior Changes

Operators who configure embedding.provider: "openai-compatible" plus embedding.baseUrl and embedding.model under plugins.entries.memory-lancedb.config.embedding get a working embeddings flow against any OpenAI-compatible local server. No behavior change for any operator who has not opted in. Existing lmstudio/openai/local/etc. adapters keep doing exactly what they do today.

Diagram

Before:
  memory-lancedb capture
    -> embedding provider rebuild
    -> lmstudio adapter create()
       -> ensureLmstudioModelLoaded(timeoutMs: 120_000)
          -> POST <local-server>/api/v0/load-model  (LMStudio-only)
          -> server replies 404 / hangs / returns unexpected shape
          -> ~30s event-loop stall before failure log
    -> /v1/embeddings call finally fires (works)

After (with embedding.provider: "openai-compatible"):
  memory-lancedb capture
    -> embedding provider rebuild
    -> openai-compatible adapter create()
       -> client construction only (~1ms)
    -> /v1/embeddings call fires immediately

Security Impact

New permissions/capabilities? No
Secrets/tokens handling changed? No. The optional apiKey is a per-plugin config field, treated identically to existing adapters' apiKey handling. Cache key strips the Authorization header.
New/changed network calls? Only when explicitly configured by the operator. No call goes out at startup.
Command/tool execution surface changed? No
Data access scope changed? No. The adapter is fully self-contained and does not consult any global models.providers.* block, so it cannot leak embedded text to a cloud provider on a stale config.

Repro + Verification

Environment

OS: macOS 26.4.1 (arm64)
Runtime/container: Node 26.0.0
Model/provider: BGE-M3 (Q8_0 GGUF) via llama.cpp llama-server on localhost:8081
Integration/channel (if any): N/A (memory plugin)
Relevant config (redacted): plugins.entries.memory-lancedb.config.embedding.provider: "openai-compatible", baseUrl: "http://localhost:8081/v1", model: "text-embedding-bge-m3", apiKey: "<bearer>", dimensions: 1024

Steps

Start any OpenAI-compatible local embedding server. For llama.cpp: llama-server -m <bge-m3.gguf> -a text-embedding-bge-m3 --embedding --host 127.0.0.1 --port 8081 --api-key <bearer>.
In ~/.openclaw/openclaw.json set memory-lancedb's embedding block to provider: "openai-compatible" plus baseUrl, model, optional apiKey/headers.
Restart the gateway. memory-lancedb captures and recalls now go through the local server with no warmup stall.

Expected

provider.embedQuery("hello") returns a 1024-dim vector in well under 200 ms. No event-loop stalls. No warmup warnings in the gateway log.

Actual

Matches expected. Verified end-to-end against llama.cpp serving BGE-M3 (terminal output included in Real behavior proof).

Evidence

Failing test/log before + passing after (terminal output in Real behavior proof above; the "before" is "this provider did not exist")
Trace/log snippets (proof script output, curl output, vitest output)

Human Verification

Verified scenarios: ran the new factory against live llama.cpp serving BGE-M3 on macOS 26.4.1 / Apple Silicon. Confirmed embedQuery and embedBatch return correct-dimensionality vectors. Confirmed factory construction completes in 1 ms with no network call (vs lmstudio adapter's ~30s warmup hang against the same server). Verified the cache key contains the per-plugin baseUrl and model, with Authorization stripped. Verified pnpm test extensions/openai-compatible-embeddings (4/4 pass), npx oxlint extensions/openai-compatible-embeddings/ (0 errors), and pnpm tsgo:prod (clean on touched files).
Edge cases checked: missing baseUrl throws a clear error rather than silently falling back. Missing model does the same. Adapter has no autoSelectPriority, so it cannot be picked automatically when the operator has another adapter's credentials configured. Headers passed through embedding.headers get attached to every request alongside the Authorization Bearer.
What you did not verify: did not run the new adapter inside an actual openclaw gateway process end-to-end (dist bundle does not include the new extension yet). Did not run pnpm check:changed in Testbox.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? Yes. Pure addition.
Config/env changes? No required changes. Operators who want the new provider opt in by setting embedding.provider: "openai-compatible" and baseUrl/model.
Migration needed? No. Operators who currently work around the gap by pointing lmstudio or openai at a local server can switch when convenient. Their existing setup keeps working.

Risks and Mitigations

Risk: an operator confuses the new HTTP-based openai-compatible provider with the existing in-process local provider.
- Mitigation: the docs example calls out the distinction explicitly, listing the deployment shapes each one targets. The provider id openai-compatible reads as "any server that speaks the OpenAI HTTP API," which is the term llama.cpp / Ollama / vLLM / TGI / LocalAI all use to describe themselves; the existing local id keeps the semantic of "local in-process model file."
Risk: an operator removes their embedding.baseUrl line by mistake while the openai-compatible provider is configured.
- Mitigation: the adapter throws a clear error at create time pointing to the missing field. No fallback to a default URL.

Changed files

CHANGELOG.md (modified, +1/-0)
docs/plugins/memory-lancedb.md (modified, +47/-4)
extensions/openai-compatible-embeddings/embedding-provider.ts (added, +112/-0)
extensions/openai-compatible-embeddings/index.ts (added, +12/-0)
extensions/openai-compatible-embeddings/memory-embedding-adapter.test.ts (added, +100/-0)
extensions/openai-compatible-embeddings/memory-embedding-adapter.ts (added, +48/-0)
extensions/openai-compatible-embeddings/openclaw.plugin.json (added, +15/-0)
extensions/openai-compatible-embeddings/package.json (added, +15/-0)
extensions/openai-compatible-embeddings/tsconfig.json (added, +16/-0)
pnpm-lock.yaml (modified, +6/-0)
test/vitest/vitest.extension-openai-compatible-embeddings.config.ts (added, +9/-0)

Code Example

{
  plugins: {
    entries: {
      "memory-lancedb": {
        enabled: true,
        config: {
          embedding: {
            provider: "openai-compatible",
            baseUrl: "http://localhost:8081/v1",
            model: "text-embedding-bge-m3",
            apiKey: "${LLAMA_API_TOKEN}",
            dimensions: 1024,
          },
        },
      },
    },
  },
}

---

2026-05-11T05:05:50  ‚áÑ res ‚úì sessions.list 22416ms
2026-05-11T05:05:50  ‚áÑ res ‚úì config.get   61301ms
2026-05-11T05:05:50  ‚áÑ res ‚úì config.get   59181ms
WARN  liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu  eventLoopDelayMaxMs=29091.7
WARN  lmstudio embeddings warmup failed; continuing without preload

---

[proof] target  : http://localhost:8081/v1
[proof] model   : text-embedding-bge-m3
[proof] factory : 1ms (no warmup, just client construction)
[proof] embed   : 124ms, dims=1024
[proof] batch   : 25ms, count=4, dims=1024
[proof] OK. openai-compatible embeddings adapter wired end-to-end against llama.cpp.

RAW_BUFFERClick to expand / collapse

Summary

Problem to solve

Operators running a self-hosted OpenAI-compatible embeddings server today have two unsatisfying choices, both of which produce real operator pain.

Point the bundled lmstudio adapter at the local server. The /v1/embeddings call works fine, but the adapter's ensureLmstudioModelLoaded warmup calls an LMStudio-only "load model" endpoint that hangs against generic servers. On my machine running llama.cpp's llama-server with BGE-M3 on localhost:8081, this hang blocks the gateway event loop for ~30 seconds per memory-lancedb embedding-provider rebuild. The gateway's own liveness diagnostic reports it as event_loop_delay = 29,091 ms, and queued sessions.list / config.get / cron.list responses balloon to 40-60 second response times during the freeze. The gateway log floods with lmstudio embeddings warmup failed; continuing without preload warnings with no operator-friendly indication that the actual cause is a vendor-specific preload endpoint mismatched against a perfectly good local server.
Point the bundled openai adapter at the local server. This works (the per-plugin embedding.baseUrl overrides the global models.providers.openai.baseUrl, and the openai adapter has no warmup), but it inherits the global openai config block's headers, attribution, and api-key resolution. If the per-plugin embedding.baseUrl line ever gets removed by mistake during a config edit, embedding requests silently fall back to api.openai.com, leaking embedded text to a cloud provider the operator may not have intended for memory.

Neither option says what it is on the tin. Operators searching for "how do I use my local embedding server with openclaw" end up confused, sometimes filing followup issues like #72875 (Unknown memory embedding provider: local) thinking the existing local adapter is what they want, when in fact the existing local is for in-process node-llama-cpp on a .gguf file and not for HTTP-based local servers.

Proposed solution

Add a new bundled extension extensions/openai-compatible-embeddings/ that registers an openai-compatible memory embedding provider adapter.

Design:

Provider id: openai-compatible. Matches the term llama.cpp, Ollama, vLLM, TGI, LocalAI, and llamafile all use to describe their HTTP API.
transport: "remote". Routed through the same SSRF + remote-fetch path as the cloud adapters.
No autoSelectPriority. Operator must opt in explicitly via embedding.provider: "openai-compatible". We do not want auto-selection, because every operator with another adapter's credentials configured would otherwise route embeddings to the cloud the moment they enabled memory-lancedb.
No authProviderId. There is no centralized auth flow for arbitrary local servers; the optional apiKey lives directly in the per-plugin embedding config block.
No warmup, preload, or model-load probe. The first /v1/embeddings call loads the model lazily, which every server in this family already does.
Reads only from the per-plugin embedding config block. Does not consult any global models.providers.* block. Cannot accidentally route to a vendor cloud.
Fails-fast with a clear error message when embedding.baseUrl or embedding.model is missing.

Config:

{
  plugins: {
    entries: {
      "memory-lancedb": {
        enabled: true,
        config: {
          embedding: {
            provider: "openai-compatible",
            baseUrl: "http://localhost:8081/v1",
            model: "text-embedding-bge-m3",
            apiKey: "${LLAMA_API_TOKEN}",
            dimensions: 1024,
          },
        },
      },
    },
  },
}

Distinct from the existing in-process local adapter (extensions/memory-core/src/memory/provider-adapters.ts), which loads a .gguf file via node-llama-cpp inside the gateway process. See "Alternatives considered" below for the full breakdown of why the two are complementary rather than redundant.

Alternatives considered

Considered four other approaches.

Use the existing local adapter (in-process node-llama-cpp). The natural first question. The existing local adapter loads a .gguf file directly into the gateway Node process via node-llama-cpp; my proposed adapter talks HTTP to a separately-running server. They are not interchangeable.

	Existing `local`	Proposed `openai-compatible`
Where the model lives	inside the gateway process	separate HTTP server
Wire	in-process Node bindings	HTTP /v1/embeddings
Reload model	gateway restart	server restart only
Share with other clients	no, gateway owns the model	yes, any HTTP client
GPU tuning surface	node-llama-cpp options	the server's own CLI flags (e.g. `llama-server -ngl ...`)
Works with Ollama / vLLM / TGI / LocalAI / llamafile	no (not Node libs)	yes (they all speak OpenAI /v1)
Operator's existing tuned setup	must be ported to node-llama-cpp options	unchanged

Operators running a separately-managed embedding server (which is the common shape on Apple Silicon, on machines with a dedicated GPU, or on shared infrastructure) cannot use the existing local adapter without abandoning their existing tuned setup. And operators on Ollama / vLLM / TGI / LocalAI / llamafile cannot use it at all because those projects are not Node libraries. Both adapters stay supported; they target different deployment shapes.

Fix the lmstudio adapter so its warmup gracefully no-ops against non-LMStudio servers. Doable, but the operator is still using provider: "lmstudio" against an Ollama or llama.cpp server, which is semantically misleading and easy to mis-document. The fix lands the same wire behavior under a wrong name.
Document the existing workaround harder (set provider: "openai" plus embedding.baseUrl). Today the docs already mention this. The trap is silent: if the per-plugin baseUrl is removed during a config edit, traffic silently goes to api.openai.com. A safer adapter that fails-fast on missing baseUrl is preferable to documentation that depends on operator vigilance.
Run a small reverse-proxy in front of the local server that stubs the LMStudio-specific endpoints and forwards /v1/embeddings. Adds infra to a memory plugin, doesn't generalize across deployments, and still leaves the misleading provider: "lmstudio" in operator config.

The proposed bundled adapter is the simplest path that solves all the failure modes above: explicit name that matches what the upstream projects call themselves, no warmup, no global config inheritance, and complementary to the existing in-process local adapter without redundancy.

Impact

Affected: any operator running a self-hosted OpenAI-compatible embeddings server for memory-lancedb. The local-embeddings server ecosystem includes llama.cpp's llama-server, Ollama (via its /v1 surface), vLLM, TGI, LocalAI, and llamafile, all of which are popular alternatives to cloud embeddings for privacy, cost, or offline reasons. The user base overlaps heavily with operators of self-hosted openclaw stacks.
Severity: high for operators in the trap. The lmstudio-warmup-against-non-LMStudio path actively stalls the gateway for ~30 seconds per memory-lancedb embedding-provider rebuild, which fires roughly every 24-30 minutes of channel activity. The dashboard goes unresponsive during the freeze, queued WebSocket calls back up, and operators spend hours diagnosing what is actually a missing-adapter UX gap. The openai-with-baseUrl-override path is medium severity: works correctly until a config edit accidentally removes the override.
Frequency: every memory-lancedb embedding-provider rebuild for affected operators. On my machine (single openclaw instance, two channels, normal usage), the rebuild fires several times per hour.
Consequence: silent gateway stalls (lmstudio path), or silent leak of embedded chat content to a cloud provider (openai path). Both are operator-trust-eroding outcomes. The proposed adapter eliminates both.

Evidence / examples

Live evidence from my machine running this exact setup (llama.cpp llama-server serving bge-m3-Q8_0.gguf on http://localhost:8081/v1).

Before (with provider: "lmstudio" and the same baseUrl), gateway log during a single memory-lancedb embedding-provider rebuild:

2026-05-11T05:05:50  ‚áÑ res ‚úì sessions.list 22416ms
2026-05-11T05:05:50  ‚áÑ res ‚úì config.get   61301ms
2026-05-11T05:05:50  ‚áÑ res ‚úì config.get   59181ms
WARN  liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu  eventLoopDelayMaxMs=29091.7
WARN  lmstudio embeddings warmup failed; continuing without preload

988 channels/imessage events queued behind the warmup hang in a 7-minute window when the issue compounded with another stall.

After (with my prototype openai-compatible adapter), live invocation against the same llama.cpp server:

[proof] target  : http://localhost:8081/v1
[proof] model   : text-embedding-bge-m3
[proof] factory : 1ms (no warmup, just client construction)
[proof] embed   : 124ms, dims=1024
[proof] batch   : 25ms, count=4, dims=1024
[proof] OK. openai-compatible embeddings adapter wired end-to-end against llama.cpp.

The factory: 1ms line is the key evidence. The lmstudio adapter takes up to 120s on the same input.

Prior art:

The existing ollama adapter (extensions/ollama/src/memory-embedding-adapter.ts) follows the same general shape: vendor-specific id, no warmup, self-contained config. It was added to fix the operator pain previously raised in #66163.
The proposed openai-compatible adapter is the same pattern, generalized for the broader local-server ecosystem rather than scoped to one vendor's native API.

Additional information

Backward compatible. Pure addition. Existing adapters (openai, lmstudio, mistral, gemini, voyage, bedrock, deepinfra, ollama, in-process local) all keep working unchanged. Operators currently working around the gap by misusing lmstudio or openai can switch to openai-compatible when convenient; their existing config keeps working in the meantime.

Accompanying PR drafted on branch feat/openai-compatible-embeddings-provider (will link the actual PR number once filed).

Related:

#72875 (open). provider: "local" fails with "Unknown memory embedding provider: local". Operators land here after misunderstanding which adapter to use for HTTP-based local servers; the existing local adapter is for in-process node-llama-cpp, not HTTP. The new openai-compatible adapter gives them the right name.
#72937 (open PR). fix for #72875's registration timing. Adjacent.
#66163 (closed). Unknown memory embedding provider: ollama, which led to the bundled ollama adapter. The proposed openai-compatible adapter follows the same pattern, generalized.
#74204 (open). memory.qmd.update.embedTimeoutMs too low for local GGUF. Same operator profile (running local embedding server), different timeout surface.
#74761 (open). Document oMLX (Apple Silicon MLX) as a memorySearch embedding provider. Same family of "add a local-server adapter" requests; oMLX exposes an OpenAI-compatible API and would work through the proposed openai-compatible adapter without further plugin code.
#60994 (closed). Cannot reliably connect to remote Ollama / LM Studio instances via LAN IP. Adjacent operator pain in the same ecosystem.
#42270 (closed). LM Studio backend regression. Related (lmstudio-adapter brittleness).

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #dependency error #configuration error #environment variable

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Feature]: bundled openai-compatible embedding provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI) [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #80479: feat(memory/embeddings): add openai-compatible provider for self-hosted servers (llama.cpp, Ollama, vLLM, TGI, LocalAI)

Description (problem / solution / changelog)

Summary

Change Type

Scope

Linked Issue/PR

Real behavior proof

Root Cause

Regression Test Plan

User-visible / Behavior Changes

Diagram

Security Impact

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification

Review Conversations

Compatibility / Migration

Risks and Mitigations

Changed files

Code Example

Summary

Problem to solve

Proposed solution

Alternatives considered

Impact

Evidence / examples

Additional information

Still need to ship something?

RELATED_DISCOVERY

TRENDING