openclaw - ✅(Solved) Fix [Bug]: OpenAI-completions prompt_cache_key regression — caching worked in 2026.3.x, broken in 2026.5.x [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#81281Fetched 2026-05-14 03:33:44
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
2
Author
Timeline (top)
cross-referenced ×3commented ×1labeled ×1

prompt_cache_key is no longer emitted on openai-completions requests in 2026.5.x; caching worked correctly on 2026.3.2 (confirmed on same hardware with same config).

Root Cause

prompt_cache_key is no longer emitted on openai-completions requests in 2026.5.x; caching worked correctly on 2026.3.2 (confirmed on same hardware with same config).

Fix Action

Fix / Workaround

  1. Deploy OpenClaw with an openai-completions provider configured with compat.supportsPromptCacheKey: true and cacheRetention: "long".
  2. Send repeated identical prompts through OpenClaw → oMLX (or any completions backend with prefix caching).
  3. Observe cached_tokens in backend responses — always 0 on 2026.5.x.
  4. Downgrade to 2026.3.2 with identical config and hardware.
  5. Repeat same prompts — cached_tokens correctly populate from request 2 onward.

Last known good version: 2026.3.2 First known bad version: 2026.5.x (exact first-bad version not tested between 2026.3.2 and 2026.5.7) No workaround found on 2026.5.x short of downgrading. Related: #69272, PR #69411 — those addressed the transport condition; this regression suggests something in that chain changed between 2026.3.2 and 2026.5.x.

PR fix notes

PR #81342: fix(agents): restore prompt_cache_key emission in buildOpenAICompletionsParams

Description (problem / solution / changelog)

Problem

Closes #81281

The prompt_cache_key field stopped being sent for openai-completions providers in 2026.5.x. Users relying on prefix caching (oMLX, llama.cpp, etc.) saw cached_tokens: 0 on every request and ~20x latency regression. Downgrading to 2026.3.2 immediately restored caching.

Root cause

A refactor between 2026.3.2 and 2026.5.x dropped two things from buildOpenAICompletionsParams in src/agents/openai-transport-stream.ts:

  1. supportsPromptCacheKey was removed from the getCompat() return type and its corresponding assignment in the return object
  2. The cacheRetention resolution and the prompt_cache_key assignment block were omitted from buildOpenAICompletionsParams

Because the field was silently absent, no runtime error was thrown — requests just never carried the key, so the backend could never match a prefix.

Fix

  • Restore supportsPromptCacheKey: boolean to getCompat()'s return type and implementation (compat.supportsPromptCacheKey === true)
  • Restore const cacheRetention = resolveCacheRetention(options?.cacheRetention) at the top of buildOpenAICompletionsParams
  • Restore the conditional: if (compat.supportsPromptCacheKey && cacheRetention !== "none" && options?.sessionId) { params.prompt_cache_key = options.sessionId; }
  • Add three regression tests covering: emit when flag set + cacheRetention=long, omit when cacheRetention=none, omit when flag not set

Testing

tsc --noEmit passes clean. Three new unit tests directly cover the regression path.

Notes

The Responses transport (buildOpenAIResponsesParams) was unaffected — it retained its prompt_cache_key logic throughout.

Changed files

  • src/agents/openai-transport-stream.test.ts (modified, +62/-0)
  • src/agents/openai-transport-stream.ts (modified, +6/-0)

Code Example

oMLX dashboard on 2026.5.7:
  Total Prefill Tokens: 748,811
  Cached Tokens: 0
  Cache Efficiency: 0.0%

Direct requests to same backend (bypassing OpenClaw):
  Request 1: cached_tokens = 0
  Request 2: cached_tokens = 71680
  Request 3: cached_tokens = 71680

Outgoing request keys observed from OpenClaw on 2026.5.7:
  model, messages, stream, max_completion_tokens, tools, reasoning_effort, metadata
  (prompt_cache_key absent)

2026.3.2 tested on same UGREEN NAS hardware: caching works correctly.
No config changes between versions.
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

prompt_cache_key is no longer emitted on openai-completions requests in 2026.5.x; caching worked correctly on 2026.3.2 (confirmed on same hardware with same config).

Steps to reproduce

  1. Deploy OpenClaw with an openai-completions provider configured with compat.supportsPromptCacheKey: true and cacheRetention: "long".
  2. Send repeated identical prompts through OpenClaw → oMLX (or any completions backend with prefix caching).
  3. Observe cached_tokens in backend responses — always 0 on 2026.5.x.
  4. Downgrade to 2026.3.2 with identical config and hardware.
  5. Repeat same prompts — cached_tokens correctly populate from request 2 onward.

Expected behavior

As observed in 2026.3.2: outgoing openai-completions requests include prompt_cache_key, and the backend reports cached_tokens > 0 on repeated requests with identical prefixes.

Actual behavior

On 2026.5.x: prompt_cache_key is absent from outgoing requests. Backend reports cached_tokens: 0 on every request. Downgrading to 2026.3.2 restores caching immediately with no config changes.

OpenClaw version

2026.5.7

Operating system

Linux (Docker container/portainer on UGREEN NAS)

Install method

docker

Model

omlx/local_model (Qwen3.6-35B-A3B-RotorQuant-MLX-4bit via oMLX)

Provider / routing chain

openclaw → oMLX (openai-completions, http://cerebro-mac:8080/v1)

Additional provider/model setup details

Provider api: openai-completions compat.supportsPromptCacheKey: true cacheRetention: "long" (set at defaults, model, and per-model override levels) contextInjection: "continuation-skip" API keys and URLs redacted.

Same config file used on both 2026.3.2 (working) and 2026.5.7 (broken). oMLX backend confirmed working: direct repeated requests to the same endpoint produce cached_tokens > 0 from request 2 onward, bypassing OpenClaw entirely.

Logs, screenshots, and evidence

oMLX dashboard on 2026.5.7:
  Total Prefill Tokens: 748,811
  Cached Tokens: 0
  Cache Efficiency: 0.0%

Direct requests to same backend (bypassing OpenClaw):
  Request 1: cached_tokens = 0
  Request 2: cached_tokens = 71680
  Request 3: cached_tokens = 71680

Outgoing request keys observed from OpenClaw on 2026.5.7:
  model, messages, stream, max_completion_tokens, tools, reasoning_effort, metadata
  (prompt_cache_key absent)

2026.3.2 tested on same UGREEN NAS hardware: caching works correctly.
No config changes between versions.

Impact and severity

Affected: any user of openai-completions with a prefix-caching-capable backend (oMLX, llama.cpp, etc.) Severity: high — defeats prefix caching entirely, causing full prefill on every request Frequency: 100% reproducible on 2026.5.7, never occurs on 2026.3.2 Consequence: significantly increased latency and compute cost per request; on local hardware this is the difference between ~3s and ~60s TTFT for long contexts.

Additional information

Last known good version: 2026.3.2 First known bad version: 2026.5.x (exact first-bad version not tested between 2026.3.2 and 2026.5.7) No workaround found on 2026.5.x short of downgrading. Related: #69272, PR #69411 — those addressed the transport condition; this regression suggests something in that chain changed between 2026.3.2 and 2026.5.x.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

As observed in 2026.3.2: outgoing openai-completions requests include prompt_cache_key, and the backend reports cached_tokens > 0 on repeated requests with identical prefixes.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING