openclaw - ✅(Solved) Fix feat: allow prompt cache key forwarding for custom/local OpenAI-compatible providers [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#61671Fetched 2026-04-08 02:56:08
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

We've built an OpenAI Responses API-compliant local server (mlx-vlm fork) that supports:

  • prompt_cache_key for per-session KV cache routing
  • input_tokens_details.cached_tokens in usage response
  • TurboQuant-compatible prefix reuse
  • Probe request filtering (short requests don't evict cached system prompt)

The server-side caching works — the only missing piece is OC forwarding the cache key.

Root Cause

When using a local OpenAI-compatible server (e.g., mlx-vlm on Apple Silicon via baseUrl: "http://localhost:8080/v1"), OpenClaw strips prompt_cache_key and prompt_cache_retention from the Responses API request because the endpoint is classified as "local" rather than "openai-public".

Fix Action

Workaround

sed -i '' 's/shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint/shouldStripResponsesPromptCache: false/' \
  /opt/homebrew/lib/node_modules/openclaw/dist/provider-attribution-BrdhzyJa.js

PR fix notes

PR #63543: feat: allow prompt cache key forwarding for custom/local providers

Description (problem / solution / changelog)

Summary

Adds a promptCache option to ModelProviderConfig that, when true, forwards prompt_cache_key and prompt_cache_retention to local/custom OpenAI-compatible servers instead of stripping them.

Fixes #61671

Problem

The shouldStripResponsesPromptCache logic strips prompt cache fields from all non-native-OpenAI endpoints, including local servers that implement their own prefix caching. This prevents cross-request KV cache reuse on servers like mlx-vlm, vLLM, and others that support it.

Impact

With prompt cache forwarding enabled on a local mlx-vlm server (Apple Silicon):

  • Cold turn: ~12s (14K token system prompt prefill)
  • Warm turn: ~3s (KV cache reused — 3-4x faster)

Changes

FileChange
src/config/types.models.tsAdd promptCache?: boolean to ModelProviderConfig
src/agents/provider-attribution.tsAdd promptCacheSupported?: boolean to capabilities input; skip stripping when true
src/agents/provider-request-config.tsThread promptCacheSupported through to capabilities resolution
src/agents/provider-attribution.test.ts3 test cases: default strips, true forwards, false strips

Usage

{
  "models": {
    "providers": {
      "local-mlx": {
        "api": "openai-responses",
        "baseUrl": "http://localhost:8080/v1",
        "promptCache": true,
        "models": [...]
      }
    }
  }
}

Tests

npx vitest run src/agents/provider-attribution.test.ts
# 26 passed

Backward compatible — defaults to current stripping behavior when promptCache is not set.

Changed files

  • src/agents/openai-responses-payload-policy.ts (modified, +2/-0)
  • src/agents/openai-transport-stream.ts (modified, +5/-3)
  • src/agents/pi-embedded-runner/model.ts (modified, +1/-0)
  • src/agents/pi-embedded-runner/openai-stream-wrappers.ts (modified, +10/-6)
  • src/agents/provider-attribution.test.ts (modified, +46/-0)
  • src/agents/provider-attribution.ts (modified, +6/-1)
  • src/agents/provider-request-config.ts (modified, +2/-0)
  • src/config/types.models.ts (modified, +6/-0)

Code Example

shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint

---

{
  "models": {
    "providers": {
      "my-local-server": {
        "api": "openai-responses",
        "baseUrl": "http://localhost:8080/v1",
        "promptCache": true,
        "models": [...]
      }
    }
  }
}

---

// Before:
shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint

// After:
shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint && input.promptCache !== true

---

sed -i '' 's/shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint/shouldStripResponsesPromptCache: false/' \
  /opt/homebrew/lib/node_modules/openclaw/dist/provider-attribution-BrdhzyJa.js
RAW_BUFFERClick to expand / collapse

Problem

When using a local OpenAI-compatible server (e.g., mlx-vlm on Apple Silicon via baseUrl: "http://localhost:8080/v1"), OpenClaw strips prompt_cache_key and prompt_cache_retention from the Responses API request because the endpoint is classified as "local" rather than "openai-public".

The stripping logic in provider-attribution is:

shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint

This means any self-hosted server that implements OpenAI-compatible prefix caching cannot benefit from OpenClaw's cache key routing, even when the server fully supports it.

Impact

With prompt_cache_key forwarded to a local mlx-vlm server:

  • Cold turn: 12.2s (14K token system prompt prefill)
  • Warm turn: 3.1s (3.9x faster — KV cache reused)

Without it (current behavior): every turn is 12-16s because the cache key is stripped and the server can't route to the correct cached KV state.

Proposed Solution

Add a per-provider config option to opt-in to prompt cache forwarding:

{
  "models": {
    "providers": {
      "my-local-server": {
        "api": "openai-responses",
        "baseUrl": "http://localhost:8080/v1",
        "promptCache": true,
        "models": [...]
      }
    }
  }
}

One-line change in resolveProviderRequestCapabilities():

// Before:
shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint

// After:
shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint && input.promptCache !== true

Context

We've built an OpenAI Responses API-compliant local server (mlx-vlm fork) that supports:

  • prompt_cache_key for per-session KV cache routing
  • input_tokens_details.cached_tokens in usage response
  • TurboQuant-compatible prefix reuse
  • Probe request filtering (short requests don't evict cached system prompt)

The server-side caching works — the only missing piece is OC forwarding the cache key.

Workaround

sed -i '' 's/shouldStripResponsesPromptCache: api !== void 0 && OPENAI_RESPONSES_APIS.has(api) && policy.usesExplicitProxyLikeEndpoint/shouldStripResponsesPromptCache: false/' \
  /opt/homebrew/lib/node_modules/openclaw/dist/provider-attribution-BrdhzyJa.js

Environment

  • OpenClaw 2026.4.2
  • mlx-vlm 0.4.4 (fork: github.com/eloe/mlx-vlm, branch: feature/combined-bastion)
  • macOS, Apple M4 Max, Qwen3.5-35B-A3B-4bit

extent analysis

TL;DR

To fix the issue, add a per-provider config option to opt-in to prompt cache forwarding by setting "promptCache": true in the provider configuration.

Guidance

  • Update the provider configuration to include the "promptCache": true option, allowing OpenClaw to forward the prompt_cache_key to the local server.
  • Modify the shouldStripResponsesPromptCache logic to check for the input.promptCache flag, as shown in the proposed solution.
  • Verify that the cache key is being forwarded correctly by checking the request payload sent to the local server.
  • Consider applying the provided workaround as a temporary fix, but be aware that it involves modifying the OpenClaw library code directly.

Example

// Example provider configuration with prompt cache enabled
{
  "models": {
    "providers": {
      "my-local-server": {
        "api": "openai-responses",
        "baseUrl": "http://localhost:8080/v1",
        "promptCache": true,
        "models": [...]
      }
    }
  }
}

Notes

The proposed solution requires modifying the OpenClaw library code, which may have implications for future updates or compatibility. The workaround provided is a temporary fix and should be used with caution.

Recommendation

Apply the workaround by setting "promptCache": true in the provider configuration, as this allows for opt-in prompt cache forwarding without modifying the OpenClaw library code directly.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix feat: allow prompt cache key forwarding for custom/local OpenAI-compatible providers [1 pull requests, 1 participants]