openclaw - ✅(Solved) Fix feat: allow prompt cache key forwarding for custom/local OpenAI-compatible providers [1 pull requests, 1 participants]

eloe · 2026-04-06T04:03:19Z

[openclaw] We've built an OpenAI Responses API-compliant local server mlx-vlm fork that supports: - prompt cache key for per-session KV cache routing - input t… We've built an OpenAI Responses API-compliant local server (`mlx-vlm` fork) that supports: - `prompt_cache_key` for per-session KV cache routing - `input_tokens_details.cached_tokens` in usage response - TurboQuant-compatible prefix reuse - Probe request filtering (short requests don't evict cached system prompt) The server-side caching works — the only missing piece is OC forwarding the cache key. # PR #63543: feat: allow prompt cache key forwarding for custom/local providers - Repository: openclaw/openclaw - Author: eloe - State: closed | merged: False - Link: https://github.com/openclaw/openclaw/pull/63543 ## Description (problem / solution / changelog) ## Summary Adds a `promptCache` option to `ModelProviderConfig` that, when `true`, forwards `prompt_cache_key` and `prompt_cache_retention` to local/custom OpenAI-compatible servers instead of stripping them. Fixes #61671 ### Problem The `shouldStripResponsesPromptCache` logic strips prompt cache fields from all non-native-OpenAI endpoints, including local servers that implement their own prefix caching. This prevents cross-request KV cache reuse on servers like mlx-vlm, vLLM, and others that support it. ### Impact With prompt cache forwarding enabled on a local mlx-vlm server (Apple Silicon): - **Cold turn**: ~12s (14K token system prompt prefill) - **Warm turn**: ~3s (KV cache reused — **3-4x faster**) ### Changes | File | Change | |---|---| | `src/config/types.models.ts` | Add `promptCache?: boolean` to `ModelProviderConfig` | | `src/agents/provider-attribution.ts` | Add `promptCacheSupported?: boolean` to capabilities input; skip stripping when `true` | | `src/agents/provider-request-config.ts` | Thread `promptCacheSupported` through to capabilities resolution | | `src/agents/provider-attribution.test.ts` | 3 test cases: default strips, `true` forwards, `false` strips | ### Usage ```json { "models": { "providers": { "local-mlx": { "api": "openai-responses", "baseUrl": "http://localhost:8080/v1", "promptCache": true, "models": [...] } } } } ``` ### Tests ``` npx vitest run src/agents/provider-attribution.test.ts # 26 passed ``` Backward compatible — defaults to current stripping behavior when `promptCache` is not set. ## Changed files - `src/agents/openai-responses-payload-policy.ts` (modified, +2/-0) - `src/agents/openai-transport-stream.ts` (modified, +5/-3) - `src/agents/pi-embedded-runner/model.ts` (modified, +1/-0) - `src/agents/pi-embedded-runner/openai-stream-wrappers.ts` (modified, +10/-6) - `src/agents/provider-attribution.test.ts` (modified, +46/-0) - `src/agents/provider-attribution.ts` (modified, +6/-1) - `src/agents/provider-request-config.ts` (modified, +2/-0) - `src/config/types.models.ts` (modified, +6/-0) ## Workaround ```bash sed -i '' 's/shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint/shouldStripResponsesPromptCache: false/' \ /opt/homebrew/lib/node_modules/openclaw/dist/provider-attribution-BrdhzyJa.js ``` ## Problem When using a local OpenAI-compatible server (e.g., `mlx-vlm` on Apple Silicon via `baseUrl: "http://localhost:8080/v1"`), OpenClaw strips `prompt_cache_key` and `prompt_cache_retention` from the Responses API request because the endpoint is classified as `"local"` rather than `"openai-public"`. The stripping logic in `provider-attribution` is: ```javascript shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint ``` This means **any self-hosted server that implements OpenAI-compatible prefix caching cannot benefit from OpenClaw's cache key routing**, even when the server fully supports it. ## Impact With `prompt_cache_key` forwarded to a local mlx-vlm server: - **Cold turn**: 12.2s (14K token system prompt prefill) - **Warm turn**: 3.1s (**3.9x faster** — KV cache reused) Without it (current behavior): every turn is 12-16s because the cache key is stripped and the server can't route to the correct cached KV state. ## Proposed Solution Add a per-provider config option to opt-in to prompt cache forwarding: ```json { "models": { "providers": { "my-local-server": { "api": "openai-responses", "baseUrl": "http://localhost:8080/v1", "promptCache": true, "models": [...] } } } } ``` One-line change in `resolveProviderRequestCapabilities()`: ```javascript // Before: shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint // After: shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint && input.promptCache !== true ``` ## Context We've built an OpenAI Responses API-compliant local server (`mlx-vlm` fork) that supports: - `prompt_cache_key` for per-session KV cache routing - `input_tokens_details.cached_to

openclaw2026-04-06 04:03:19

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#61671•Fetched 2026-04-08 02:56:08

View on GitHub

Comments

Participants

Timeline

Reactions

Author

eloe

Participants

eloe

We've built an OpenAI Responses API-compliant local server (mlx-vlm fork) that supports:

prompt_cache_key for per-session KV cache routing
input_tokens_details.cached_tokens in usage response
TurboQuant-compatible prefix reuse
Probe request filtering (short requests don't evict cached system prompt)

The server-side caching works — the only missing piece is OC forwarding the cache key.

Root Cause

When using a local OpenAI-compatible server (e.g., mlx-vlm on Apple Silicon via baseUrl: "http://localhost:8080/v1"), OpenClaw strips prompt_cache_key and prompt_cache_retention from the Responses API request because the endpoint is classified as "local" rather than "openai-public".

Fix Action

Workaround

sed -i '' 's/shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint/shouldStripResponsesPromptCache: false/' \
  /opt/homebrew/lib/node_modules/openclaw/dist/provider-attribution-BrdhzyJa.js

PR fix notes

PR #63543: feat: allow prompt cache key forwarding for custom/local providers

Repository: openclaw/openclaw
Author: eloe
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/63543

Description (problem / solution / changelog)

Summary

Adds a promptCache option to ModelProviderConfig that, when true, forwards prompt_cache_key and prompt_cache_retention to local/custom OpenAI-compatible servers instead of stripping them.

Fixes #61671

Problem

The shouldStripResponsesPromptCache logic strips prompt cache fields from all non-native-OpenAI endpoints, including local servers that implement their own prefix caching. This prevents cross-request KV cache reuse on servers like mlx-vlm, vLLM, and others that support it.

Impact

With prompt cache forwarding enabled on a local mlx-vlm server (Apple Silicon):

Cold turn: ~12s (14K token system prompt prefill)
Warm turn: ~3s (KV cache reused — 3-4x faster)

Changes

File	Change
`src/config/types.models.ts`	Add `promptCache?: boolean` to `ModelProviderConfig`
`src/agents/provider-attribution.ts`	Add `promptCacheSupported?: boolean` to capabilities input; skip stripping when `true`
`src/agents/provider-request-config.ts`	Thread `promptCacheSupported` through to capabilities resolution
`src/agents/provider-attribution.test.ts`	3 test cases: default strips, `true` forwards, `false` strips

Usage

{
  "models": {
    "providers": {
      "local-mlx": {
        "api": "openai-responses",
        "baseUrl": "http://localhost:8080/v1",
        "promptCache": true,
        "models": [...]
      }
    }
  }
}

Tests

npx vitest run src/agents/provider-attribution.test.ts
# 26 passed

Backward compatible — defaults to current stripping behavior when promptCache is not set.

Changed files

src/agents/openai-responses-payload-policy.ts (modified, +2/-0)
src/agents/openai-transport-stream.ts (modified, +5/-3)
src/agents/pi-embedded-runner/model.ts (modified, +1/-0)
src/agents/pi-embedded-runner/openai-stream-wrappers.ts (modified, +10/-6)
src/agents/provider-attribution.test.ts (modified, +46/-0)
src/agents/provider-attribution.ts (modified, +6/-1)
src/agents/provider-request-config.ts (modified, +2/-0)
src/config/types.models.ts (modified, +6/-0)

Code Example

shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint

---

{
  "models": {
    "providers": {
      "my-local-server": {
        "api": "openai-responses",
        "baseUrl": "http://localhost:8080/v1",
        "promptCache": true,
        "models": [...]
      }
    }
  }
}

---

// Before:
shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint

// After:
shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint && input.promptCache !== true

---

sed -i '' 's/shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint/shouldStripResponsesPromptCache: false/' \
  /opt/homebrew/lib/node_modules/openclaw/dist/provider-attribution-BrdhzyJa.js

RAW_BUFFERClick to expand / collapse

Problem

The stripping logic in provider-attribution is:

shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint

This means any self-hosted server that implements OpenAI-compatible prefix caching cannot benefit from OpenClaw's cache key routing, even when the server fully supports it.

Impact

With prompt_cache_key forwarded to a local mlx-vlm server:

Cold turn: 12.2s (14K token system prompt prefill)
Warm turn: 3.1s (3.9x faster — KV cache reused)

Without it (current behavior): every turn is 12-16s because the cache key is stripped and the server can't route to the correct cached KV state.

Proposed Solution

Add a per-provider config option to opt-in to prompt cache forwarding:

{
  "models": {
    "providers": {
      "my-local-server": {
        "api": "openai-responses",
        "baseUrl": "http://localhost:8080/v1",
        "promptCache": true,
        "models": [...]
      }
    }
  }
}

One-line change in resolveProviderRequestCapabilities():

// Before:
shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint

// After:
shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint && input.promptCache !== true

Context

We've built an OpenAI Responses API-compliant local server (mlx-vlm fork) that supports:

prompt_cache_key for per-session KV cache routing
input_tokens_details.cached_tokens in usage response
TurboQuant-compatible prefix reuse
Probe request filtering (short requests don't evict cached system prompt)

The server-side caching works — the only missing piece is OC forwarding the cache key.

Workaround

sed -i '' 's/shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint/shouldStripResponsesPromptCache: false/' \
  /opt/homebrew/lib/node_modules/openclaw/dist/provider-attribution-BrdhzyJa.js

Environment

OpenClaw 2026.4.2
mlx-vlm 0.4.4 (fork: github.com/eloe/mlx-vlm, branch: feature/combined-bastion)
macOS, Apple M4 Max, Qwen3.5-35B-A3B-4bit

extent analysis

TL;DR

To fix the issue, add a per-provider config option to opt-in to prompt cache forwarding by setting "promptCache": true in the provider configuration.

Guidance

Update the provider configuration to include the "promptCache": true option, allowing OpenClaw to forward the prompt_cache_key to the local server.
Modify the shouldStripResponsesPromptCache logic to check for the input.promptCache flag, as shown in the proposed solution.
Verify that the cache key is being forwarded correctly by checking the request payload sent to the local server.
Consider applying the provided workaround as a temporary fix, but be aware that it involves modifying the OpenClaw library code directly.

Example

// Example provider configuration with prompt cache enabled
{
  "models": {
    "providers": {
      "my-local-server": {
        "api": "openai-responses",
        "baseUrl": "http://localhost:8080/v1",
        "promptCache": true,
        "models": [...]
      }
    }
  }
}

Notes

The proposed solution requires modifying the OpenClaw library code, which may have implications for future updates or compatibility. The workaround provided is a temporary fix and should be used with caution.

Recommendation

Apply the workaround by setting "promptCache": true in the provider configuration, as this allows for opt-in prompt cache forwarding without modifying the OpenClaw library code directly.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #prompt issue #agent setup #task chaining #parallel task

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix feat: allow prompt cache key forwarding for custom/local OpenAI-compatible providers [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

PR fix notes

PR #63543: feat: allow prompt cache key forwarding for custom/local providers

Description (problem / solution / changelog)

Summary

Problem

Impact

Changes

Usage

Tests

Changed files

Code Example

Problem

Impact

Proposed Solution

Context

Workaround

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING