openclaw - 💡(How to fix) Fix Runtime sites bypass `prependSystemPromptAdditionAfterCacheBoundary`, destabilising Anthropic + OpenAI prompt caching

Root Cause

Anthropic prompt cache discount is 90% on cache reads (vs uncached input rate). Cache writes cost +25%.
OpenAI Codex / Responses prompt cache discount is ~50%.
For us, A-Chat at scale runs 70-80 turns/day across 10 agents. Difference between 50% and 90% effective hit rate on a 33K prefix is measurable spend AND ~1-3s latency per turn.

Fix Action

Fix / Workaround

File:line (2026.5.20)	Site	Trigger	Cache impact	Status
`bootstrap-budget-WP_UMPQC.js:183`	`appendBootstrapPromptWarning` — `return prompt ? \`${prompt}\n\n${warningBlock}` : warningBlock`	when AGENTS.md/TOOLS.md exceed `bootstrapMaxChars`	adds warning block at end of system prompt; new byte hash on every turn until truncation stops	still naive
`pi-embedded-CJ87lW5R.js:2304`	promptAdditions concat — `\`${basePrompt}\n\n${promptAdditions.join("\n\n")}``	tool-use-only / planning-only / empty-response / compaction retry guards	appends retry/continuation directives at end	still naive
`selection-BmjEdnnA.js` (userPromptPrefix)	`userPromptPrefixText`	webchat / Control-UI-Embed flag	—	restructured — now passed as a structured `userPromptPrefixText` param into the prompting object rather than concatenated at this site; cache-safety depends on the downstream consumer, not verified
`selection-BmjEdnnA.js:14350`	hook prepend/append — `effectivePrompt = \`${hookResult.prependContext}\n\n${effectivePrompt}``/``${effectivePrompt}\n\n${hookResult.appendContext}``	before-prompt-build hook returns `prependContext`/`appendContext`	per-turn dynamic content from plugins	still naive
`prepare.runtime-DRNTE3Jm.js:762`	hookResult prepend/append — same `\`${prependContext}\n\n${preparedPrompt}`` pattern	same hook surface	duplicates the selection-path bug in the prepare path	still naive
`subagent-spawn-ALADHNnO.js:941`	childSystemPrompt suffix — `\`${childSystemPrompt}\n\n${materializedAttachments.systemPromptSuffix}``	subagent dispatch with materialised attachments	subagent path only	still naive
`apply-DKRHb9iS.js:125`	generic helper — `if (!base) return suffix; return \`${base}\n\n${suffix}`.trim()`	varies	callers should use the boundary helper	still naive

Note on bootstrapMaxChars: our local mitigation raised it 20K → 50K to avoid the bootstrap-warning condition entirely. If openclaw's default is still 20K, large AGENTS.md/TOOLS.md workspaces will keep hitting bootstrap-budget-WP_UMPQC.js:183 on every turn.

Symptom

In a multi-agent A-Chat deployment running 9 agents on openai-codex/gpt-5.5 and 1 on anthropic/claude-opus-4-6, OpenAI Codex automatic prefix caching engages partially: per-turn cached_tokens / input_tokens ratio alternates between ~0% (cold) and 88-94% (peak). Within a "bucket" of identical-byte system prompts, hit rate stabilises at 0.83-0.92. Across a 6-turn smoke test, multiple bucket-equivalent system-prompt hashes appear because runtime sites append dynamic content past the byte-stable bulk.

Root cause map (file:line, behaviour)

Re-verified against openclaw 2026.5.20 on 2026-05-22. Chunk filenames below are from the 2026.5.20 published bundle (/usr/lib/node_modules/openclaw/dist/). 6 of the 7 originally-found sites still do naive concat; 1 (userPromptPrefixText) was restructured — see note. The helper and marker still exist and are still unused by these sites.

The runtime already exports SYSTEM_PROMPT_CACHE_BOUNDARY = "\n\n" and the helper prependSystemPromptAdditionAfterCacheBoundary(params) from system-prompt-cache-boundary-T51pGsv9.js. These keep dynamic suffixes outside the cacheable region for Anthropic-style requests and (when the suffix is at the very end) preserve byte-identical prefix for OpenAI automatic caching.

The following runtime sites instead do naive concat:

File:line (2026.5.20)	Site	Trigger	Cache impact	Status
`bootstrap-budget-WP_UMPQC.js:183`	`appendBootstrapPromptWarning` — `return prompt ? \`${prompt}\n\n${warningBlock}` : warningBlock`	when AGENTS.md/TOOLS.md exceed `bootstrapMaxChars`	adds warning block at end of system prompt; new byte hash on every turn until truncation stops	still naive
`pi-embedded-CJ87lW5R.js:2304`	promptAdditions concat — `\`${basePrompt}\n\n${promptAdditions.join("\n\n")}``	tool-use-only / planning-only / empty-response / compaction retry guards	appends retry/continuation directives at end	still naive
`selection-BmjEdnnA.js` (userPromptPrefix)	`userPromptPrefixText`	webchat / Control-UI-Embed flag	—	restructured — now passed as a structured `userPromptPrefixText` param into the prompting object rather than concatenated at this site; cache-safety depends on the downstream consumer, not verified
`selection-BmjEdnnA.js:14350`	hook prepend/append — `effectivePrompt = \`${hookResult.prependContext}\n\n${effectivePrompt}``/``${effectivePrompt}\n\n${hookResult.appendContext}``	before-prompt-build hook returns `prependContext`/`appendContext`	per-turn dynamic content from plugins	still naive
`prepare.runtime-DRNTE3Jm.js:762`	hookResult prepend/append — same `\`${prependContext}\n\n${preparedPrompt}`` pattern	same hook surface	duplicates the selection-path bug in the prepare path	still naive
`subagent-spawn-ALADHNnO.js:941`	childSystemPrompt suffix — `\`${childSystemPrompt}\n\n${materializedAttachments.systemPromptSuffix}``	subagent dispatch with materialised attachments	subagent path only	still naive
`apply-DKRHb9iS.js:125`	generic helper — `if (!base) return suffix; return \`${base}\n\n${suffix}`.trim()`	varies	callers should use the boundary helper	still naive

Impact (numbers from a real deployment)

9 agents × ~33-37K input tokens per turn × ~6-12 turns/day organic A-Chat traffic.
Without prefix-cache: full input tokens billed per turn.
With prefix-cache (Codex): 88-94% of those tokens become cache hits at ~50% rate.
Empirically: alternation breaks cache continuity to ~50-65% effective hit rate vs the 88-94% achievable when the prefix is genuinely byte-stable.

Proposed fix

Switch each enumerated site to prependSystemPromptAdditionAfterCacheBoundary (or its equivalent for end-appended content). The runtime already calls applyAnthropicCacheControlToSystem which reads the boundary marker. For OpenAI requests the marker is harmless metadata but the by-product (dynamic content kept in the dynamic-suffix region) preserves byte-identical prefix, which is exactly what OpenAI's automatic prefix cache needs.

Two concrete patterns:

Replace prompt = \${prompt}\n\n${suffix}`with the existing helper, ensuring aSYSTEM_PROMPT_CACHE_BOUNDARY` is inserted before the suffix (not before the suffix-of-suffix).
Document that callers of apply (the generic helper) need to explicitly opt into boundary-respecting concat.

Compatibility

The boundary marker is already silently stripped if cache-control is disabled (stripAnthropicSystemPromptBoundary).
For non-Anthropic providers, the marker is just an HTML comment — invisible to the model.
No behavioural change on prompts that don't carry dynamic suffixes; the marker is only inserted when an addition is actually appended.

Why this matters

Anthropic prompt cache discount is 90% on cache reads (vs uncached input rate). Cache writes cost +25%.
OpenAI Codex / Responses prompt cache discount is ~50%.
For us, A-Chat at scale runs 70-80 turns/day across 10 agents. Difference between 50% and 90% effective hit rate on a 33K prefix is measurable spend AND ~1-3s latency per turn.

What we did locally

We measured. We raised bootstrapMaxChars from 20K → 50K (eliminates the bootstrap-warning condition). We folded the visible-answer-continuation text into one agent's SOUL.md as always-present content (does not stop the runtime injector — confirmed empirically — but keeps the appended bytes byte-identical). We added cached_tokens propagation in our SDK proxy. We did NOT fork the runtime — dist/ is bundled and post-install mutation is fragile across openclaw update.

We'd much rather have this fixed upstream.

Note on bootstrapMaxChars: our local mitigation raised it 20K → 50K to avoid the bootstrap-warning condition entirely. If openclaw's default is still 20K, large AGENTS.md/TOOLS.md workspaces will keep hitting bootstrap-budget-WP_UMPQC.js:183 on every turn.

Capture telemetry available

We have ~24-48 hours of [capture-req] / [capture-resp] log data in the proxy with system_hash, prefix_hash, tool_hash, tool_names_order, full Codex usage blocks. Happy to share aggregated stats if useful.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering