gemini-cli - ✅(Solved) Fix [Security] Add pre-flight secret and credential scanning before context is sent to the API [1 pull requests, 1 comments, 1 participants]

gemini-cli2026-04-23 00:27:24

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

google-gemini/gemini-cli#25837•Fetched 2026-04-23 07:44:35

View on GitHub

Comments

Participants

Timeline

Reactions

Author

prashantkul

Participants

prashantkul

Timeline (top)

commented ×1cross-referenced ×1labeled ×1

Error Message

action = "redact" # "redact" | "warn" | "block" "action": "redact", // "redact" | "warn" | "block"

PR fix notes

PR #25865: feat(security): layered shell deobfuscation, secret scanning, content sanitization

Repository: google-gemini/gemini-cli
Author: jasonmatthewsuhari
State: open | merged: False
Link: https://github.com/google-gemini/gemini-cli/pull/25865

Description (problem / solution / changelog)

Fixes #25836, #25837, and #25838.

Summary

Adds three complementary, deterministic defense-in-depth layers for prompt injection and credential leakage:

Shell deobfuscation (#25836): decodes base64 subshells, hex escapes, and variable indirection; auto-denies whitespace-padding and invisible-Unicode commands. Decoded payload is shown alongside the raw command in the confirmation UI so the user sees what actually executes.
Secret scanning (#25837): regex + generic env_credential fallback redacts AWS keys, GitHub/Google/Slack tokens, PEM private keys, connection strings, JWTs, and PASSWORD=/SECRET=/TOKEN=/... assignments from read_file, read_many_files, grep_search, and run_shell_command output before it enters the model context. Warns before reading .env, *.pem, id_rsa, etc.
Content sanitization (#25838): strips HTML comments, invisible Unicode, structural injection phrases (instruction hijacking, role assignment, exfiltration directives, system-prompt extraction, output suppression), and excessive whitespace padding from web_fetch, file-read tools, untrusted MCP results, and GEMINI.md project memory on load.

Secret scanning and content sanitization are opt-in via security.experimental.{secretScanning,contentSanitization}.enabled in settings.json. Shell deobfuscation is always on (deterministic, near-zero false-positive cost on legitimate commands, per the issue's recommended design).

Test plan

38 new unit tests pass (packages/core/src/safety/{shell-deobfuscator,secret-scanner,content-sanitizer}.test.ts) covering detection, redaction, false-positive avoidance, and edge cases.
Type-check clean on all modified files.
Manually verify a shell command with a base64 subshell surfaces the decoded payload in the confirmation UI.
Manually verify reading an .env file emits the sensitive-filename warning and redacts key=value pairs.
Manually verify a GEMINI.md containing  has the comment and phrase stripped at session load.
Confirm features are off by default when security.experimental.* is unset.

Implementation notes

All three layers are heuristic pre-filters, not complete IPI defenses — they are designed to complement Conseca (semantic intent) and Causal Armor (#25829, causal attribution). The three checkers answer different questions: what does this command actually do (deobfuscator), does this content carry credentials (scanner), does this content carry injection phrases (sanitizer).
Secret redaction preserves structure: DATABASE_URL=[REDACTED:connection_string] keeps the model's ability to reason about the code without exposing the value.
Redaction notices surface in returnDisplay (user-visible) but the redacted content is what the model sees.

Changed files

packages/cli/src/config/config.ts (modified, +6/-0)
packages/cli/src/config/settingsSchema.ts (modified, +89/-0)
packages/cli/src/ui/components/messages/ToolConfirmationMessage.tsx (modified, +36/-1)
packages/core/package.json (modified, +1/-0)
packages/core/src/config/config.ts (modified, +9/-0)
packages/core/src/core/coreToolHookTriggers.ts (modified, +168/-0)
packages/core/src/safety/content-sanitizer.test.ts (added, +160/-0)
packages/core/src/safety/content-sanitizer.ts (added, +122/-0)
packages/core/src/safety/ner-pii-scanner.test.ts (added, +115/-0)
packages/core/src/safety/ner-pii-scanner.ts (added, +171/-0)
packages/core/src/safety/secret-scanner.test.ts (added, +132/-0)
packages/core/src/safety/secret-scanner.ts (added, +103/-0)
packages/core/src/safety/shell-deobfuscator.test.ts (added, +132/-0)
packages/core/src/safety/shell-deobfuscator.ts (added, +254/-0)
packages/core/src/tools/shell.ts (modified, +23/-0)
packages/core/src/tools/tools.ts (modified, +4/-0)
packages/core/src/utils/memoryDiscovery.ts (modified, +18/-1)

Code Example

User: "Fix the database connection"
Agent reads .env:
  DATABASE_URL=postgres://admin:s3cretP@ss!@prod.db.internal:5432/app
  STRIPE_SECRET_KEY=sk_live_51HG7...
  AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCY...

→ All three credentials are sent to the API, unfiltered.

---

DATABASE_URL=[REDACTED:connection_string]
STRIPE_SECRET_KEY=[REDACTED:api_key]
AWS_SECRET_ACCESS_KEY=[REDACTED:aws_credential]

---

[[safety_checker]]
toolName = ["read_file"]
priority = 70

[safety_checker.checker]
type = "external"
name = "secret-scanner"
required_context = ["environment"]

[safety_checker.checker.config]
action = "redact"           # "redact" | "warn" | "block"
entropy_threshold = 4.5     # Shannon entropy threshold
custom_patterns = []        # Additional regex patterns

---

// ~/.gemini/settings.json
{
  "security": {
    "secretScanning": {
      "enabled": true,
      "action": "redact",        // "redact" | "warn" | "block"
      "patterns": "default",     // "default" | "strict" | "custom"
      "entropyScanning": false,  // Enable entropy-based detection
      "allowedPaths": [],        // Paths exempt from scanning (e.g., test fixtures)
      "customPatterns": []       // Additional regex patterns
    }
  }
}

RAW_BUFFERClick to expand / collapse

What would you like to be added?

A pre-flight secret scanner that detects and redacts credentials, API keys, connection strings, and PII from the context window before it is transmitted to the Gemini API. This would prevent accidental credential leakage during normal agent operations.

The Gap

Gemini CLI has no secret detection mechanism. When the agent reads a file, its full contents — including any embedded credentials — are sent to the Gemini API in the context window:

User: "Fix the database connection"
Agent reads .env:
  DATABASE_URL=postgres://admin:s3cretP@[email protected]:5432/app
  STRIPE_SECRET_KEY=sk_live_51HG7...
  AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCY...

→ All three credentials are sent to the API, unfiltered.

.gitignore patterns only prevent file discovery (glob, search). If the agent explicitly reads a file path — which it routinely does when asked to debug database connections, fix API integrations, or troubleshoot deployments — ignore patterns don't block it.

The strict sandbox profile allows reads from ~/.gemini, ~/.config, ~/.npm, ~/.cache, all of which may contain tokens or credentials.

Proposed Solution: Pre-Flight Redaction Pipeline

A multi-stage scanner that intercepts context before API transmission:

Stage 1 — Regex Pattern Scanner (deterministic, fast):

Pattern	Example Match
AWS Access Key ID	`AKIA[0-9A-Z]{16}`
AWS Secret Access Key	40-character base64 string following `aws_secret_access_key`
Generic API Key	`[A-Za-z0-9]{32,}` following `api_key`, `apikey`, `api-key`, `token`, `secret`
Connection strings	`postgres://`, `mysql://`, `mongodb://`, `redis://` with embedded credentials
Private keys	`-----BEGIN (RSA
GitHub tokens	`ghp_[A-Za-z0-9]{36}`, `gho_`, `ghs_`, `ghr_`
Google API keys	`AIza[0-9A-Za-z\-_]{35}`
Slack tokens	`xoxb-`, `xoxp-`, `xoxs-`
JWT tokens	`eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+`
.env key=value pairs	`[A-Z_]+=.` in files matching `.env` patterns

Stage 2 — Entropy-Based Detection (optional, catches novel formats):

High-entropy strings (Shannon entropy > threshold) in value positions are flagged as potential secrets even if they don't match known patterns. Similar to trufflehog / detect-secrets.

Stage 3 — Redaction:

Detected secrets are replaced with type-tagged placeholders:

DATABASE_URL=[REDACTED:connection_string]
STRIPE_SECRET_KEY=[REDACTED:api_key]
AWS_SECRET_ACCESS_KEY=[REDACTED:aws_credential]

The agent can still see the structure (there's a DATABASE_URL, it's a postgres connection) without seeing the actual credentials. This preserves the agent's ability to reason about the code while protecting the secrets.

Integration Options

Option A — External safety checker (recommended):

Register as an external checker targeting read_file and tool results. The checker scans file content before it reaches the model:

[[safety_checker]]
toolName = ["read_file"]
priority = 70

[safety_checker.checker]
type = "external"
name = "secret-scanner"
required_context = ["environment"]

[safety_checker.checker.config]
action = "redact"           # "redact" | "warn" | "block"
entropy_threshold = 4.5     # Shannon entropy threshold
custom_patterns = []        # Additional regex patterns

The checker would return ask_user with the detected secrets listed, letting the user decide whether to proceed with redacted content.

Option B — Context pre-processing hook:

Use the BeforeTool hooks system to scan tool results (AfterTool) and redact secrets before they enter the conversation history.

Option C — Built-in redaction in the content pipeline:

Add a redaction pass to the content generator pipeline that scans outbound context before API calls. This is the most thorough but requires deeper integration.

Configuration

// ~/.gemini/settings.json
{
  "security": {
    "secretScanning": {
      "enabled": true,
      "action": "redact",        // "redact" | "warn" | "block"
      "patterns": "default",     // "default" | "strict" | "custom"
      "entropyScanning": false,  // Enable entropy-based detection
      "allowedPaths": [],        // Paths exempt from scanning (e.g., test fixtures)
      "customPatterns": []       // Additional regex patterns
    }
  }
}

Why is this needed?

This is the most common real-world data leakage scenario. Developers routinely ask agents to "fix the database connection" or "debug the API integration" — tasks that naturally lead the agent to read credential files. Every such interaction sends unfiltered credentials to the API.
.gitignore is not a security boundary. It prevents file discovery but not explicit reads. The agent can and does read_file .env when directed to by the task context.
The sandbox doesn't help. Even the strict sandbox profile allows reads from ~/.config, ~/.cache, and other paths where tokens may reside. And sandbox is opt-in.
Every other major CLI tool has this. GitHub CLI redacts tokens from debug output. AWS CLI masks credentials in logs. Docker CLI warns about secrets in build context. Gemini CLI is an outlier in sending credentials to a remote API with zero scanning.
The fix is deterministic and fast. Regex-based scanning adds negligible latency. No LLM calls needed. False positive rate for well-known patterns (AWS keys, GitHub tokens, PEM headers) is near zero.
Users cannot reasonably audit every file read. In a typical session, the agent may read dozens of files. The user cannot check each one for embedded credentials. Automated scanning is the only scalable solution.

Additional context

Related: Issue #25829 (Causal Armor), Issue #25836 (shell deobfuscation)
The safety checker framework (PR #12504) supports external checkers that could implement this
Reference implementations exist in detect-secrets (Yelp), trufflehog (TruffleHog), and gitleaks
OWASP Top 10 for LLM Applications (2025) lists "Sensitive Information Disclosure" as a top risk
A reference implementation with regex scanner, env masker, and redaction engine exists at gemini-cli-provenance-armor

extent analysis

TL;DR

Implement a pre-flight secret scanner to detect and redact credentials, API keys, and PII from the context window before transmission to the Gemini API.

Guidance

Integrate a secret scanning mechanism, such as a regex pattern scanner, to identify potential secrets in files read by the agent.
Implement a redaction pipeline to replace detected secrets with type-tagged placeholders, preserving the agent's ability to reason about the code while protecting sensitive information.
Consider using an external safety checker or a context pre-processing hook to scan tool results and redact secrets before they enter the conversation history.
Configure the secret scanning settings, such as enabling entropy-based detection and customizing allowed paths and patterns, to balance security and usability.

Example

[[safety_checker]]
toolName = ["read_file"]
priority = 70

[safety_checker.checker]
type = "external"
name = "secret-scanner"
required_context = ["environment"]

[safety_checker.checker.config]
action = "redact"           # "redact" | "warn" | "block"
entropy_threshold = 4.5     # Shannon entropy threshold
custom_patterns = []        # Additional regex patterns

Notes

The proposed solution involves integrating a secret scanning mechanism, which may require additional development and testing to ensure its effectiveness and accuracy. The choice of implementation option (external safety checker, context pre-processing hook, or built-in redaction) depends on the specific requirements and constraints of the Gemini CLI.

Recommendation

Apply the workaround by implementing a pre-flight secret scanner, such as the proposed regex pattern scanner, to detect and redact credentials and API keys from the context window. This will help prevent accidental credential leakage during normal agent operations.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #conversation history #API rate limit #retriever error #database connection

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

gemini-cli - ✅(Solved) Fix [Security] Add pre-flight secret and credential scanning before context is sent to the API [1 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

PR fix notes

PR #25865: feat(security): layered shell deobfuscation, secret scanning, content sanitization

Description (problem / solution / changelog)

Summary

Test plan

Implementation notes

Changed files

Code Example

What would you like to be added?

The Gap

Proposed Solution: Pre-Flight Redaction Pipeline

Integration Options

Configuration

Why is this needed?

Additional context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

gemini-cli - ✅(Solved) Fix [Security] Add pre-flight secret and credential scanning before context is sent to the API [1 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

PR fix notes

PR #25865: feat(security): layered shell deobfuscation, secret scanning, content sanitization

Description (problem / solution / changelog)

Summary

Test plan

Implementation notes

Changed files

Code Example

What would you like to be added?

The Gap

Proposed Solution: Pre-Flight Redaction Pipeline

Integration Options

Configuration

Why is this needed?

Additional context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING