claude-code - 💡(How to fix) Fix Feature: Auto-redact secrets in tool outputs before they enter context / transcripts [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#46201Fetched 2026-04-11 06:26:29
View on GitHub
Comments
1
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×3commented ×1

When Claude Code tool results (Bash, Read, Grep, etc.) contain API keys, tokens, or credentials, they are stored verbatim in the conversation context. If the user later agrees to share a session transcript with Anthropic (via the in-session feedback prompt), those secrets are included in the submission. There should be a harness-level auto-redaction layer that catches common secret patterns before they enter the context window, not relying on model instructions alone.

Error Message

  1. Warn before transcript submission: When the feedback/transcript sharing prompt appears, show a notice like:

Root Cause

When Claude Code tool results (Bash, Read, Grep, etc.) contain API keys, tokens, or credentials, they are stored verbatim in the conversation context. If the user later agrees to share a session transcript with Anthropic (via the in-session feedback prompt), those secrets are included in the submission. There should be a harness-level auto-redaction layer that catches common secret patterns before they enter the context window, not relying on model instructions alone.

RAW_BUFFERClick to expand / collapse

Summary

When Claude Code tool results (Bash, Read, Grep, etc.) contain API keys, tokens, or credentials, they are stored verbatim in the conversation context. If the user later agrees to share a session transcript with Anthropic (via the in-session feedback prompt), those secrets are included in the submission. There should be a harness-level auto-redaction layer that catches common secret patterns before they enter the context window, not relying on model instructions alone.

The real-world scenario

During a long debugging session, I was diagnosing model routing issues in an OpenClaw plugin. Claude read my openclaw.json config file via Python scripts in Bash tool calls. That config contains API keys for multiple providers (Perplexity, xAI/Grok, Anthropic, etc.). Claude did attempt to redact them in its own output, but the raw tool results, which contain the full keys, were already in the context.

Midway through the session, Claude Code popped up an in-CLI prompt asking how Claude was performing and whether I'd like to share the session transcript with Anthropic. I agreed, it seemed like useful feedback. But here's the problem:

By that point, Claude had been working on a complex multi-step task for 40+ minutes. I was simultaneously working on other topics in other terminal windows and other systems. When the feedback prompt appeared, I couldn't realistically recall everything that had happened in the session, which files were read, what tool outputs contained secrets, etc. I only realized afterward that API keys from my config had been in the tool outputs the entire time.

What was exposed

Tool results from Bash calls (Python scripts parsing JSON config) contained:

  • Perplexity API key (pplx-...)
  • xAI/Grok API key (xai-...)
  • Anthropic API key (sk-ant-oat01-...)
  • Various OAuth tokens

Claude's own responses redacted these (e.g., safe['apiKey'] = safe['apiKey'][:8] + '...'), but the underlying Bash tool results stored the full unredacted output in context.

Proposed solution

Harness-level secret redaction, before tool output enters the conversation context:

  1. Pattern-based detection: Scan tool results for common secret patterns:

    • sk-ant-, sk-, pplx-, xai-, ghp_, gho_, Bearer , token=
    • Generic high-entropy strings next to keys like apiKey, secret, token, authorization
  2. Redact in the stored context, not just in the model's visible output. Replace matches with [REDACTED-8chars...] so the model can still reference them if needed.

  3. Warn before transcript submission: When the feedback/transcript sharing prompt appears, show a notice like:

    "This session's context contains N tool outputs. Secrets were [auto-redacted / not checked]. Review before sharing."

  4. Respect .env-style files: If a Read tool opens a file matching patterns like .env, credentials.*, config.json containing key-value pairs with secret-looking values, apply the same redaction.

Why model-level instructions aren't enough

  • The model can only redact in its own output. Tool results are stored by the harness before the model sees them.
  • Instructions like "don't log secrets" depend on the model recognizing what's a secret in arbitrary JSON/YAML/config output, which is unreliable.
  • Even when the model correctly redacts in its response, the raw tool output remains in context and gets included in any transcript export.

Context

  • Claude Code CLI (terminal)
  • Sessions can run 40+ minutes on complex multi-step tasks
  • Users work across multiple windows/systems simultaneously and cannot track every tool output in real time
  • The transcript sharing prompt appears mid-session with no summary of what's in the context

extent analysis

TL;DR

Implement a harness-level auto-redaction layer to detect and redact common secret patterns in tool results before they enter the conversation context.

Guidance

  • Develop a pattern-based detection system to identify common secret patterns in tool results, such as API keys and OAuth tokens, and redact them before storing in context.
  • Modify the feedback prompt to warn users when secrets have been auto-redacted or not checked, allowing them to review before sharing the transcript.
  • Consider respecting .env-style files and applying redaction to key-value pairs with secret-looking values.
  • Ensure the redaction occurs at the harness level, rather than relying on model-level instructions, to prevent secrets from being stored in the context.

Example

A possible implementation could involve using regular expressions to detect common secret patterns, such as sk-ant-, pplx-, or xai-, and replacing matches with a redacted string, e.g., [REDACTED-8chars...].

Notes

The proposed solution requires careful consideration of the types of secrets that need to be redacted and the potential false positives that may occur. Additionally, the warning prompt should be designed to be clear and concise, allowing users to make informed decisions about sharing their transcripts.

Recommendation

Apply the proposed harness-level secret redaction solution to prevent secrets from being stored in the conversation context and potentially exposed when sharing transcripts. This approach provides a more reliable and robust solution than relying on model-level instructions alone.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING