openclaw - 💡(How to fix) Fix Runtime sites bypass `prependSystemPromptAdditionAfterCacheBoundary`, destabilising Anthropic + OpenAI prompt caching

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

  • Anthropic prompt cache discount is 90% on cache reads (vs uncached input rate). Cache writes cost +25%.
  • OpenAI Codex / Responses prompt cache discount is ~50%.
  • For us, A-Chat at scale runs 70-80 turns/day across 10 agents. Difference between 50% and 90% effective hit rate on a 33K prefix is measurable spend AND ~1-3s latency per turn.

Fix Action

Fix / Workaround

File:line (2026.5.20)SiteTriggerCache impactStatus
bootstrap-budget-WP_UMPQC.js:183appendBootstrapPromptWarningreturn prompt ? \${prompt}\n\n${warningBlock}` : warningBlock`when AGENTS.md/TOOLS.md exceed bootstrapMaxCharsadds warning block at end of system prompt; new byte hash on every turn until truncation stopsstill naive
pi-embedded-CJ87lW5R.js:2304promptAdditions concat — \${basePrompt}\n\n${promptAdditions.join("\n\n")}``tool-use-only / planning-only / empty-response / compaction retry guardsappends retry/continuation directives at endstill naive
selection-BmjEdnnA.js (userPromptPrefix)userPromptPrefixTextwebchat / Control-UI-Embed flagrestructured — now passed as a structured userPromptPrefixText param into the prompting object rather than concatenated at this site; cache-safety depends on the downstream consumer, not verified
selection-BmjEdnnA.js:14350hook prepend/append — effectivePrompt = \${hookResult.prependContext}\n\n${effectivePrompt}`/`${effectivePrompt}\n\n${hookResult.appendContext}``before-prompt-build hook returns prependContext/appendContextper-turn dynamic content from pluginsstill naive
prepare.runtime-DRNTE3Jm.js:762hookResult prepend/append — same \${prependContext}\n\n${preparedPrompt}`` patternsame hook surfaceduplicates the selection-path bug in the prepare pathstill naive
subagent-spawn-ALADHNnO.js:941childSystemPrompt suffix — \${childSystemPrompt}\n\n${materializedAttachments.systemPromptSuffix}``subagent dispatch with materialised attachmentssubagent path onlystill naive
apply-DKRHb9iS.js:125generic helper — if (!base) return suffix; return \${base}\n\n${suffix}`.trim()`variescallers should use the boundary helperstill naive

Note on bootstrapMaxChars: our local mitigation raised it 20K → 50K to avoid the bootstrap-warning condition entirely. If openclaw's default is still 20K, large AGENTS.md/TOOLS.md workspaces will keep hitting bootstrap-budget-WP_UMPQC.js:183 on every turn.

RAW_BUFFERClick to expand / collapse

Symptom

In a multi-agent A-Chat deployment running 9 agents on openai-codex/gpt-5.5 and 1 on anthropic/claude-opus-4-6, OpenAI Codex automatic prefix caching engages partially: per-turn cached_tokens / input_tokens ratio alternates between ~0% (cold) and 88-94% (peak). Within a "bucket" of identical-byte system prompts, hit rate stabilises at 0.83-0.92. Across a 6-turn smoke test, multiple bucket-equivalent system-prompt hashes appear because runtime sites append dynamic content past the byte-stable bulk.

Root cause map (file:line, behaviour)

Re-verified against openclaw 2026.5.20 on 2026-05-22. Chunk filenames below are from the 2026.5.20 published bundle (/usr/lib/node_modules/openclaw/dist/). 6 of the 7 originally-found sites still do naive concat; 1 (userPromptPrefixText) was restructured — see note. The helper and marker still exist and are still unused by these sites.

The runtime already exports SYSTEM_PROMPT_CACHE_BOUNDARY = "\n<!-- OPENCLAW_CACHE_BOUNDARY -->\n" and the helper prependSystemPromptAdditionAfterCacheBoundary(params) from system-prompt-cache-boundary-T51pGsv9.js. These keep dynamic suffixes outside the cacheable region for Anthropic-style requests and (when the suffix is at the very end) preserve byte-identical prefix for OpenAI automatic caching.

The following runtime sites instead do naive concat:

File:line (2026.5.20)SiteTriggerCache impactStatus
bootstrap-budget-WP_UMPQC.js:183appendBootstrapPromptWarningreturn prompt ? \${prompt}\n\n${warningBlock}` : warningBlock`when AGENTS.md/TOOLS.md exceed bootstrapMaxCharsadds warning block at end of system prompt; new byte hash on every turn until truncation stopsstill naive
pi-embedded-CJ87lW5R.js:2304promptAdditions concat — \${basePrompt}\n\n${promptAdditions.join("\n\n")}``tool-use-only / planning-only / empty-response / compaction retry guardsappends retry/continuation directives at endstill naive
selection-BmjEdnnA.js (userPromptPrefix)userPromptPrefixTextwebchat / Control-UI-Embed flagrestructured — now passed as a structured userPromptPrefixText param into the prompting object rather than concatenated at this site; cache-safety depends on the downstream consumer, not verified
selection-BmjEdnnA.js:14350hook prepend/append — effectivePrompt = \${hookResult.prependContext}\n\n${effectivePrompt}`/`${effectivePrompt}\n\n${hookResult.appendContext}``before-prompt-build hook returns prependContext/appendContextper-turn dynamic content from pluginsstill naive
prepare.runtime-DRNTE3Jm.js:762hookResult prepend/append — same \${prependContext}\n\n${preparedPrompt}`` patternsame hook surfaceduplicates the selection-path bug in the prepare pathstill naive
subagent-spawn-ALADHNnO.js:941childSystemPrompt suffix — \${childSystemPrompt}\n\n${materializedAttachments.systemPromptSuffix}``subagent dispatch with materialised attachmentssubagent path onlystill naive
apply-DKRHb9iS.js:125generic helper — if (!base) return suffix; return \${base}\n\n${suffix}`.trim()`variescallers should use the boundary helperstill naive

Impact (numbers from a real deployment)

  • 9 agents × ~33-37K input tokens per turn × ~6-12 turns/day organic A-Chat traffic.
  • Without prefix-cache: full input tokens billed per turn.
  • With prefix-cache (Codex): 88-94% of those tokens become cache hits at ~50% rate.
  • Empirically: alternation breaks cache continuity to ~50-65% effective hit rate vs the 88-94% achievable when the prefix is genuinely byte-stable.

Proposed fix

Switch each enumerated site to prependSystemPromptAdditionAfterCacheBoundary (or its equivalent for end-appended content). The runtime already calls applyAnthropicCacheControlToSystem which reads the boundary marker. For OpenAI requests the marker is harmless metadata but the by-product (dynamic content kept in the dynamic-suffix region) preserves byte-identical prefix, which is exactly what OpenAI's automatic prefix cache needs.

Two concrete patterns:

  1. Replace prompt = \${prompt}\n\n${suffix}`with the existing helper, ensuring aSYSTEM_PROMPT_CACHE_BOUNDARY` is inserted before the suffix (not before the suffix-of-suffix).
  2. Document that callers of apply (the generic helper) need to explicitly opt into boundary-respecting concat.

Compatibility

  • The boundary marker is already silently stripped if cache-control is disabled (stripAnthropicSystemPromptBoundary).
  • For non-Anthropic providers, the marker is just an HTML comment — invisible to the model.
  • No behavioural change on prompts that don't carry dynamic suffixes; the marker is only inserted when an addition is actually appended.

Why this matters

  • Anthropic prompt cache discount is 90% on cache reads (vs uncached input rate). Cache writes cost +25%.
  • OpenAI Codex / Responses prompt cache discount is ~50%.
  • For us, A-Chat at scale runs 70-80 turns/day across 10 agents. Difference between 50% and 90% effective hit rate on a 33K prefix is measurable spend AND ~1-3s latency per turn.

What we did locally

We measured. We raised bootstrapMaxChars from 20K → 50K (eliminates the bootstrap-warning condition). We folded the visible-answer-continuation text into one agent's SOUL.md as always-present content (does not stop the runtime injector — confirmed empirically — but keeps the appended bytes byte-identical). We added cached_tokens propagation in our SDK proxy. We did NOT fork the runtime — dist/ is bundled and post-install mutation is fragile across openclaw update.

We'd much rather have this fixed upstream.

Note on bootstrapMaxChars: our local mitigation raised it 20K → 50K to avoid the bootstrap-warning condition entirely. If openclaw's default is still 20K, large AGENTS.md/TOOLS.md workspaces will keep hitting bootstrap-budget-WP_UMPQC.js:183 on every turn.

Capture telemetry available

We have ~24-48 hours of [capture-req] / [capture-resp] log data in the proxy with system_hash, prefix_hash, tool_hash, tool_names_order, full Codex usage blocks. Happy to share aggregated stats if useful.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Runtime sites bypass `prependSystemPromptAdditionAfterCacheBoundary`, destabilising Anthropic + OpenAI prompt caching