openclaw - ✅(Solved) Fix [Bug]: OpenAI-completions prompt_cache_key regression — caching worked in 2026.3.x, broken in 2026.5.x [1 pull requests, 1 comments, 2 participants]

openclaw2026-05-13 03:57:42

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#81281•Fetched 2026-05-14 03:33:44

View on GitHub

Comments

Participants

Timeline

Reactions

Author

juaps

Participants

clawsweeper[bot]

juaps

Timeline (top)

cross-referenced ×3commented ×1labeled ×1

prompt_cache_key is no longer emitted on openai-completions requests in 2026.5.x; caching worked correctly on 2026.3.2 (confirmed on same hardware with same config).

Root Cause

prompt_cache_key is no longer emitted on openai-completions requests in 2026.5.x; caching worked correctly on 2026.3.2 (confirmed on same hardware with same config).

Fix Action

Fix / Workaround

Deploy OpenClaw with an openai-completions provider configured with compat.supportsPromptCacheKey: true and cacheRetention: "long".
Send repeated identical prompts through OpenClaw → oMLX (or any completions backend with prefix caching).
Observe cached_tokens in backend responses — always 0 on 2026.5.x.
Downgrade to 2026.3.2 with identical config and hardware.
Repeat same prompts — cached_tokens correctly populate from request 2 onward.

Last known good version: 2026.3.2 First known bad version: 2026.5.x (exact first-bad version not tested between 2026.3.2 and 2026.5.7) No workaround found on 2026.5.x short of downgrading. Related: #69272, PR #69411 — those addressed the transport condition; this regression suggests something in that chain changed between 2026.3.2 and 2026.5.x.

PR fix notes

PR #81342: fix(agents): restore prompt_cache_key emission in buildOpenAICompletionsParams

Repository: openclaw/openclaw
Author: Bartok9
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/81342

Description (problem / solution / changelog)

Problem

Closes #81281

The prompt_cache_key field stopped being sent for openai-completions providers in 2026.5.x. Users relying on prefix caching (oMLX, llama.cpp, etc.) saw cached_tokens: 0 on every request and ~20x latency regression. Downgrading to 2026.3.2 immediately restored caching.

Root cause

A refactor between 2026.3.2 and 2026.5.x dropped two things from buildOpenAICompletionsParams in src/agents/openai-transport-stream.ts:

supportsPromptCacheKey was removed from the getCompat() return type and its corresponding assignment in the return object
The cacheRetention resolution and the prompt_cache_key assignment block were omitted from buildOpenAICompletionsParams

Because the field was silently absent, no runtime error was thrown — requests just never carried the key, so the backend could never match a prefix.

Fix

Restore supportsPromptCacheKey: boolean to getCompat()'s return type and implementation (compat.supportsPromptCacheKey === true)
Restore const cacheRetention = resolveCacheRetention(options?.cacheRetention) at the top of buildOpenAICompletionsParams
Restore the conditional: if (compat.supportsPromptCacheKey && cacheRetention !== "none" && options?.sessionId) { params.prompt_cache_key = options.sessionId; }
Add three regression tests covering: emit when flag set + cacheRetention=long, omit when cacheRetention=none, omit when flag not set

Testing

tsc --noEmit passes clean. Three new unit tests directly cover the regression path.

Notes

The Responses transport (buildOpenAIResponsesParams) was unaffected — it retained its prompt_cache_key logic throughout.

Changed files

src/agents/openai-transport-stream.test.ts (modified, +62/-0)
src/agents/openai-transport-stream.ts (modified, +6/-0)

Code Example

oMLX dashboard on 2026.5.7:
  Total Prefill Tokens: 748,811
  Cached Tokens: 0
  Cache Efficiency: 0.0%

Direct requests to same backend (bypassing OpenClaw):
  Request 1: cached_tokens = 0
  Request 2: cached_tokens = 71680
  Request 3: cached_tokens = 71680

Outgoing request keys observed from OpenClaw on 2026.5.7:
  model, messages, stream, max_completion_tokens, tools, reasoning_effort, metadata
  (prompt_cache_key absent)

2026.3.2 tested on same UGREEN NAS hardware: caching works correctly.
No config changes between versions.

RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

Summary

prompt_cache_key is no longer emitted on openai-completions requests in 2026.5.x; caching worked correctly on 2026.3.2 (confirmed on same hardware with same config).

Steps to reproduce

Deploy OpenClaw with an openai-completions provider configured with compat.supportsPromptCacheKey: true and cacheRetention: "long".
Send repeated identical prompts through OpenClaw → oMLX (or any completions backend with prefix caching).
Observe cached_tokens in backend responses — always 0 on 2026.5.x.
Downgrade to 2026.3.2 with identical config and hardware.
Repeat same prompts — cached_tokens correctly populate from request 2 onward.

Expected behavior

As observed in 2026.3.2: outgoing openai-completions requests include prompt_cache_key, and the backend reports cached_tokens > 0 on repeated requests with identical prefixes.

Actual behavior

On 2026.5.x: prompt_cache_key is absent from outgoing requests. Backend reports cached_tokens: 0 on every request. Downgrading to 2026.3.2 restores caching immediately with no config changes.

OpenClaw version

2026.5.7

Operating system

Linux (Docker container/portainer on UGREEN NAS)

Install method

docker

Model

omlx/local_model (Qwen3.6-35B-A3B-RotorQuant-MLX-4bit via oMLX)

Provider / routing chain

openclaw → oMLX (openai-completions, http://cerebro-mac:8080/v1)

Additional provider/model setup details

Provider api: openai-completions compat.supportsPromptCacheKey: true cacheRetention: "long" (set at defaults, model, and per-model override levels) contextInjection: "continuation-skip" API keys and URLs redacted.

Same config file used on both 2026.3.2 (working) and 2026.5.7 (broken). oMLX backend confirmed working: direct repeated requests to the same endpoint produce cached_tokens > 0 from request 2 onward, bypassing OpenClaw entirely.

Logs, screenshots, and evidence

oMLX dashboard on 2026.5.7:
  Total Prefill Tokens: 748,811
  Cached Tokens: 0
  Cache Efficiency: 0.0%

Direct requests to same backend (bypassing OpenClaw):
  Request 1: cached_tokens = 0
  Request 2: cached_tokens = 71680
  Request 3: cached_tokens = 71680

Outgoing request keys observed from OpenClaw on 2026.5.7:
  model, messages, stream, max_completion_tokens, tools, reasoning_effort, metadata
  (prompt_cache_key absent)

2026.3.2 tested on same UGREEN NAS hardware: caching works correctly.
No config changes between versions.

Impact and severity

Affected: any user of openai-completions with a prefix-caching-capable backend (oMLX, llama.cpp, etc.) Severity: high — defeats prefix caching entirely, causing full prefill on every request Frequency: 100% reproducible on 2026.5.7, never occurs on 2026.3.2 Consequence: significantly increased latency and compute cost per request; on local hardware this is the difference between ~3s and ~60s TTFT for long contexts.

Additional information

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

As observed in 2026.3.2: outgoing openai-completions requests include prompt_cache_key, and the backend reports cached_tokens > 0 on repeated requests with identical prefixes.

#api #model download #tokenizer error #prompt formatting #chain error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: OpenAI-completions prompt_cache_key regression — caching worked in 2026.3.x, broken in 2026.5.x [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #81342: fix(agents): restore prompt_cache_key emission in buildOpenAICompletionsParams

Description (problem / solution / changelog)

Problem

Root cause

Fix

Testing

Notes

Changed files

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING