gemini-cli - ✅(Solved) Fix Add integration tests for tool sandboxing with plans and tasks [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
google-gemini/gemini-cli#24932Fetched 2026-04-09 08:17:12
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Author
Participants
Assignees
Timeline (top)
labeled ×4assigned ×1issue_type_added ×1

PR fix notes

PR #25298: refactor(core): abstract OS sandbox managers and fix virtual command permissions

Description (problem / solution / changelog)

Summary

This PR introduces a significant architectural refactor to the Sandbox Management system across all supported platforms (macOS, Linux, Windows), vastly improving security, maintainability, and testing infrastructure. Most importantly, it closes a critical security vulnerability regarding the escalation of virtual command permissions.

Details

Architectural Changes:

  • Abstract Base Class: Introduced AbstractOsSandboxManager to standardize the command preparation pipeline (sanitization, overrides, path resolution) using the Template Method pattern. OS-specific managers (MacOsSandboxManager, LinuxSandboxManager, WindowsSandboxManager) now extend this base class, reducing duplicated orchestration logic.
  • Path Utilities: Core path utilities (sanitizePaths, getPathIdentity) were relocated from the bloated sandboxManager.ts to a dedicated utils/paths.ts and renamed to deduplicateAbsolutePaths and toPathKey. This enforces consistent path case-insensitivity rules across the agent lifecycle.

Behavioral & Security Fixes:

  • Virtual Command Privilege Escalation (CRITICAL FIX): Previously, virtual __write commands were spoofed as cat on POSIX systems during early pipeline adjustments to bypass isToolApproved checks. This incorrectly granted the sandboxed tee process full workspaceWrite permissions. The AbstractOsSandboxManager now injects the targeted write path into the dynamic fileSystem permissions array before path resolution, keeping workspaceWrite = false and ensuring the sandbox restricts access solely to the target file.
  • Governance File Verification: touch() operations for governance files (.git, .gitignore, .geminiignore) are now universally enforced by the abstract base class prior to sandbox execution. Previously, macOS skipped this step, leaving workspaces vulnerable to malicious .git initialization if absent.
  • Windows Case-Insensitivity: isKnownSafeCommand now explicitly enforces case-insensitivity matching against the approvedTools list.

Testing Enhancements:

  • Deleted the monolithic sandboxManager.test.ts.
  • Introduced abstractOsSandboxManager.test.ts and structurally standardized LinuxSandboxManager.test.ts, MacOsSandboxManager.test.ts, and WindowsSandboxManager.test.ts with consistent describe groupings for delegated methods.
  • Added more test cases to sandboxManager.integration.test.ts, including coverage for Plan Mode.

Related Issues

Fixes https://github.com/google-gemini/gemini-cli/issues/24932.

How to Validate

  1. Run the workspace test suite: npm run test -w @google/gemini-cli-core
  2. Execute the global preflight: npm run preflight
  3. Verify the behavior of virtual __write commands inside a restricted sandbox environment (e.g. Plan Mode) to ensure only the specified target file is writable and the rest of the workspace remains read-only.

Pre-Merge Checklist

  • Updated relevant documentation and README (if needed)
  • Added/updated tests (if needed)
  • Noted breaking changes (if any)
  • Validated on required platforms/methods:
    • MacOS
      • npm run
      • npx
      • Docker
      • Podman
      • Seatbelt
    • Windows
      • npm run
      • npx
      • Docker
    • Linux
      • npm run
      • npx
      • Docker

Changed files

  • packages/core/src/policy/sandboxPolicyManager.ts (modified, +4/-4)
  • packages/core/src/sandbox/abstractOsSandboxManager.test.ts (added, +315/-0)
  • packages/core/src/sandbox/abstractOsSandboxManager.ts (added, +326/-0)
  • packages/core/src/sandbox/linux/LinuxSandboxManager.test.ts (modified, +59/-17)
  • packages/core/src/sandbox/linux/LinuxSandboxManager.ts (modified, +43/-139)
  • packages/core/src/sandbox/linux/bwrapArgsBuilder.test.ts (modified, +28/-2)
  • packages/core/src/sandbox/linux/bwrapArgsBuilder.ts (modified, +12/-2)
  • packages/core/src/sandbox/macos/MacOsSandboxManager.test.ts (modified, +62/-166)
  • packages/core/src/sandbox/macos/MacOsSandboxManager.ts (modified, +35/-118)
  • packages/core/src/sandbox/macos/seatbeltArgsBuilder.test.ts (modified, +1/-1)
  • packages/core/src/sandbox/macos/seatbeltArgsBuilder.ts (modified, +1/-1)
  • packages/core/src/sandbox/utils/sandboxReadWriteUtils.ts (modified, +18/-25)
  • packages/core/src/sandbox/windows/WindowsSandboxManager.test.ts (modified, +463/-413)
  • packages/core/src/sandbox/windows/WindowsSandboxManager.ts (modified, +103/-155)
  • packages/core/src/services/sandboxManager.integration.test.ts (modified, +608/-372)
  • packages/core/src/services/sandboxManager.test.ts (removed, +0/-465)
  • packages/core/src/services/sandboxManager.ts (modified, +0/-236)
  • packages/core/src/tools/shell.ts (modified, +4/-9)
  • packages/core/src/utils/paths.test.ts (modified, +59/-0)
  • packages/core/src/utils/paths.ts (modified, +39/-0)

PR #25307: test(core): improve sandbox integration test coverage and fix OS-specific failures

Description (problem / solution / changelog)

Summary

This PR significantly improves the integration test coverage for the sandbox environments across all major operating systems (Linux, macOS, Windows). In the process of expanding this test suite to rigorously verify sandbox boundaries and policies, several OS-specific failures were identified and resolved, including initialization inefficiencies and a Bubblewrap permission bug on Linux.

Details

Test Coverage Enhancements (Primary)

  • Expanded sandboxManager.integration.test.ts to comprehensively exercise sandbox boundaries across Linux, macOS, and Windows.
  • Added explicit verification for edge cases including recursive directory protection, symlink traversal restrictions, and governance file modification prevention.
  • Added explicit verification for writing to authorized paths when the workspace is otherwise read-only (Plan Mode).
  • Refactored Workspace Setup: Migrated the integration test suite from beforeAll to beforeEach for workspace initialization. Each test case now receives a fresh, isolated temporary directory to prevent state leakage (critical for Windows ACL checks). Added an afterEach cleanup block that specifically tracks and removes isolated test directories, drastically reducing the disk footprint during testing.
  • Improved unit testing for bwrapArgsBuilder (Linux) and MacOsSandboxManager (macOS).

Sandbox Logic Fixes (Secondary)

  • Linux (bwrap) Security:
    • Fixed a bug where write access to policy-authorized paths was incorrectly denied if the path did not yet exist and the command was not explicitly recognized as a write command.
    • The builder now grants read-write access (--bind-try) to any path in policyAllowed and its parent directory, unless the command is an explicit read-only virtual command (__read), in which case the parent directory is safely bound as read-only.
  • Linux (bwrap) & Windows Optimization:
    • Refactored sandbox initialization (ensureGovernanceFilesExist) to ensure that governance files (e.g., .git, .gitignore) are secured and verified exactly once per session workspace, rather than redundantly on every single command execution. This reduces disk I/O and improves performance for long-running sessions.
  • Windows:
    • Updated tests to pass pre-existing parent directories to policy.allowedPaths instead of non-existent target files, satisfying the Windows Sandbox requirement that granular access can only be granted to existing filesystem objects.
    • Refactored the native C# helper compilation (ensureHelperCompiled) to be a globally static initialization, ensuring it only compiles once per Node process.

Related Issues

Fixes https://github.com/google-gemini/gemini-cli/issues/24932.

How to Validate

Windows, Linux, macOS

Run integration tests:

npm run test -w @google/gemini-cli-core -- src/services/sandboxManager.integration.test.ts

Linux

Verify unit tests:

npm run test -w @google/gemini-cli-core -- src/sandbox/linux/bwrapArgsBuilder.test.ts

Changed files

  • packages/core/src/sandbox/linux/LinuxSandboxManager.ts (modified, +22/-8)
  • packages/core/src/sandbox/linux/bwrapArgsBuilder.test.ts (modified, +23/-3)
  • packages/core/src/sandbox/linux/bwrapArgsBuilder.ts (modified, +8/-5)
  • packages/core/src/sandbox/macos/MacOsSandboxManager.test.ts (modified, +1/-23)
  • packages/core/src/sandbox/windows/WindowsSandboxManager.ts (modified, +46/-35)
  • packages/core/src/services/sandboxManager.integration.test.ts (modified, +641/-377)
RAW_BUFFERClick to expand / collapse

We had some fixes to make the experimental tool sandboxing work well with plans (and tasks)

We need to add integration tests to ensure these do not regress, ideally before promoting tool sandboxing from experimental to stable

extent analysis

TL;DR

Add integration tests to ensure the stability of the experimental tool sandboxing feature before promoting it to stable.

Guidance

  • Review the changes made in the provided pull requests (24047, 24638, 24762) to understand the fixes and updates to the experimental tool sandboxing feature.
  • Identify key scenarios and edge cases that should be covered by integration tests to prevent regressions.
  • Develop and run integration tests to validate the functionality and stability of the tool sandboxing feature in conjunction with plans and tasks.
  • Consider prioritizing tests based on the most critical fixes and updates made in the referenced pull requests.

Example

No specific code example can be provided without more context, but tests should ideally cover various interactions between tool sandboxing, plans, and tasks.

Notes

The exact nature and scope of the integration tests will depend on the specific requirements and functionality of the experimental tool sandboxing feature, as well as the changes made in the referenced pull requests.

Recommendation

Apply workaround: Develop and implement comprehensive integration tests before promoting the experimental tool sandboxing feature to stable, to ensure its reliability and prevent potential regressions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

gemini-cli - ✅(Solved) Fix Add integration tests for tool sandboxing with plans and tasks [2 pull requests, 1 participants]