openclaw - 💡(How to fix) Fix [Bug]: Tool result secret redaction mutates session history, breaking KV cache prefix matching for local LLM providers [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#80379Fetched 2026-05-11 03:15:21
View on GitHub
Comments
1
Participants
2
Timeline
1
Reactions
2
Timeline (top)
commented ×1

sanitizeToolArgs() and sanitizeToolResult() in attempt.tool-run-context.ts apply redactStringsDeep()redactToolPayloadText() to all tool call inputs and outputs. Strings matching DEFAULT_REDACT_PATTERNS are truncated via maskToken() (e.g. DISCORD_BOT_TOKENDISC…N). The redacted content is stored in the session and sent to the model on subsequent turns.

For cloud API providers this is a reasonable security measure. For local inference providers (llama-server, exo, LM Studio, Ollama, vLLM, sglang) where tool results never leave the machine, the redaction provides no security benefit while causing a full KV cache re-prefill every time a tool result contains a pattern that matches the secret detector.

Root Cause

Even on full-attention models, prefix caching (llama.cpp --cache-reuse, exo's built-in cache) is invalidated because the cached content no longer matches the modified session.

Fix Action

Workaround

No-op both functions in the dist bundle:

function sanitizeToolArgs(args) { return args; }
function sanitizeToolResult(result) { return result; }

Code Example

- DISCORD_BOT_TOKEN=\"${DISCORD_BOT_TOKEN:-}\"
+ DISCORD_BOT_TOKEN=\"${DISCN:-}\"
- TELEGRAM_BOT_TOKEN=\"${TELEGRAM_BOT_TOKEN:-}\"
+ TELEGRAM_BOT_TOKEN=\"${TELEN:-}\"

---

{
  "tools": {
    "redactSecrets": true
  }
}

---

function sanitizeToolArgs(args) { return args; }
function sanitizeToolResult(result) { return result; }
RAW_BUFFERClick to expand / collapse

[Bug]: Tool result secret redaction mutates session history, breaking KV cache prefix matching for local LLM providers

Summary

sanitizeToolArgs() and sanitizeToolResult() in attempt.tool-run-context.ts apply redactStringsDeep()redactToolPayloadText() to all tool call inputs and outputs. Strings matching DEFAULT_REDACT_PATTERNS are truncated via maskToken() (e.g. DISCORD_BOT_TOKENDISC…N). The redacted content is stored in the session and sent to the model on subsequent turns.

For cloud API providers this is a reasonable security measure. For local inference providers (llama-server, exo, LM Studio, Ollama, vLLM, sglang) where tool results never leave the machine, the redaction provides no security benefit while causing a full KV cache re-prefill every time a tool result contains a pattern that matches the secret detector.

Reproduction

  1. Configure a local provider (api: "openai-completions", baseUrl: "http://localhost:...")
  2. Have the agent read a file containing environment variable references (e.g. ${DISCORD_BOT_TOKEN}, ${TELEGRAM_BOT_TOKEN})
  3. On the next turn, observe the tool result content has been mutated in the session store
  4. The KV cache diverges at the mutated token position; the engine re-processes everything from that point forward

Evidence

Proxy diff between consecutive turns on same session showing tool result mutation:

- DISCORD_BOT_TOKEN=\"${DISCORD_BOT_TOKEN:-}\"
+ DISCORD_BOT_TOKEN=\"${DISC…N:-}\"
- TELEGRAM_BOT_TOKEN=\"${TELEGRAM_BOT_TOKEN:-}\"
+ TELEGRAM_BOT_TOKEN=\"${TELE…N:-}\"

The mutation path: executeToolCallsbuildToolRunContextsanitizeToolArgs/sanitizeToolResultredactStringsDeepredactToolPayloadTextredactText(text, DEFAULT_REDACT_PATTERNS)maskToken(token)${token.slice(0, keep_start)}…${token.slice(-keep_end)}

Impact

On SWA models (Qwen, Kimi-K2) any mid-sequence token change forces re-processing of everything after the divergence point. A single redacted string in a tool result at message position N forces re-prefill of all tokens from position N to end of context. On a 50k+ token conversation this costs minutes of compute per turn.

Even on full-attention models, prefix caching (llama.cpp --cache-reuse, exo's built-in cache) is invalidated because the cached content no longer matches the modified session.

Suggested fix

Gate redaction behind a config toggle:

{
  "tools": {
    "redactSecrets": true
  }
}

Default true for cloud providers, false (or auto-detect) for local providers where isStrictOpenAiCompatible === true or the provider family is not in the cloud provider list.

Alternatively, apply redaction only to the transcript copy (via the existing transcript/runtime-context split) rather than to the session store copy that feeds the next prompt build.

Workaround

No-op both functions in the dist bundle:

function sanitizeToolArgs(args) { return args; }
function sanitizeToolResult(result) { return result; }

Environment

  • OpenClaw v2026.5.7 (eeef486)
  • macOS, Apple Silicon
  • Local inference via exo (MLX, Kimi-K2.5/K2.6)
  • File: attempt.tool-run-context-*.js

Related issues

  • #19892 — 2026.2.15 breaks prompt cache for local model providers
  • #20430 — Per-message metadata in system prompt invalidates llama.cpp KV cache
  • #20894 — Inbound metadata injection into system prompt breaks Anthropic prompt caching
  • #70829 — propagate timeoutMs to guarded dispatchers (local LLM 60s timeout)

Same mutation class as the removed enforceToolResultContextBudgetInPlace (in-flight compaction) and the removed replaceMessages (session store writeback) — prompt-build-time transforms that benefit stateless cloud APIs but destroy prefix-caching for stateful local engines.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING