gemini-cli - 💡(How to fix) Fix [WSL2][CRITICAL] Comprehensive reliability failure report: 7 incidents, fork table exhaustion, model comparison vs Claude Sonnet 4.6 [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
google-gemini/gemini-cli#26117Fetched 2026-04-29 06:35:42
View on GitHub
Comments
1
Participants
1
Timeline
9
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×7commented ×1labeled ×1

Error Message

Symptom: Error in: hooks.AfterAgent[0] Expected object, received string on every startup. Root cause: Confirmed by reading bundle source chunk-WFCK2Z32.js — Zod schema changed with no migration, no docs update, no helpful error message. Old valid format silently discarded: Symptom: Error: write EPIPE at writeToStdout — CLI crashes after every agent turn. 6. Breaking changes need migration: The hook schema change in v0.39.1 broke existing configs silently. Zod schema changes to user-facing config formats require a migration path and a clear, actionable error message.

Root Cause

Incident 1 — OAuth Session Silent Invalidation

Symptom: Failed to sign in: Cloud Code Private API not enabled drops to interactive auth menu, blocking all automation. Root cause: ~/.gemini/google_accounts.json shows "active": null after silent token drop. Additionally ~/.gemini/projects.json accumulated a corrupt "/mnt/c": "c" entry from a session launched from a Windows-mounted path, binding workspace to wrong GCP project. Recovery: Manual gemini auth reset, deletion of projects.json, re-auth. Time lost: ~45 minutes.

Code Example

"hooks": { "SessionStart": ["echo 'hello'"] }

---

"hooks": { "SessionStart": [{ "hooks": [{ "type": "command", "command": "echo 'hello'" }] }] }

---

-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
/home/user/.npm-global/bin/gemini: fork: retry: Resource temporarily unavailable
RAW_BUFFERClick to expand / collapse

[CRITICAL] WSL2 Reliability Report: Weeks of Cascading Failures with Gemini CLI — Comprehensive Timeline & Model Comparison

This is a comprehensive report documenting months of production use of Gemini CLI in a WSL2 environment for autonomous agentic (--yolo) workflows. The issues documented here are not theoretical — each represents a real incident that consumed hours of debugging time and interrupted production work.


Environment

  • OS: Windows 11 (host) + WSL2 Ubuntu 24.04
  • Node.js: v22.17.0
  • Gemini CLI: v0.39.1 (npm global install)
  • Auth mode: oauth-personal (Google One AI Ultra subscription)
  • Shell: bash (WSL2)
  • Use case: Autonomous agentic workflows via --yolo mode, long-running sessions (4-8h), multiple MCP servers

Why This Report Matters

We switched from Claude Code CLI to Gemini CLI specifically to use Google's models within an autonomous pipeline. After weeks of use, we are documenting the results factually. Every incident below required manual intervention that Claude Code CLI did not require in the same environment.


Incident Timeline (Chronological)

Incident 1 — OAuth Session Silent Invalidation

Symptom: Failed to sign in: Cloud Code Private API not enabled drops to interactive auth menu, blocking all automation. Root cause: ~/.gemini/google_accounts.json shows "active": null after silent token drop. Additionally ~/.gemini/projects.json accumulated a corrupt "/mnt/c": "c" entry from a session launched from a Windows-mounted path, binding workspace to wrong GCP project. Recovery: Manual gemini auth reset, deletion of projects.json, re-auth. Time lost: ~45 minutes.

Incident 2 — Hook Schema Silent Breaking Change (v0.39.1)

Symptom: Error in: hooks.AfterAgent[0] Expected object, received string on every startup. Root cause: Confirmed by reading bundle source chunk-WFCK2Z32.js — Zod schema changed with no migration, no docs update, no helpful error message. Old valid format silently discarded:

"hooks": { "SessionStart": ["echo 'hello'"] }

New required format (undocumented):

"hooks": { "SessionStart": [{ "hooks": [{ "type": "command", "command": "echo 'hello'" }] }] }

Time lost: ~2 hours reverse-engineering the bundle.

Incident 3 — EPIPE Crash on AfterAgent Hook

Symptom: Error: write EPIPE at writeToStdout — CLI crashes after every agent turn. Root cause: Hook subprocess inherits parent stdout during teardown. Even nohup .../dev/null 2>&1 & does not prevent it. Time lost: ~3 hours.

Incident 4 — --yolo Mode Blocked by Untrusted Workspace Prompt

Symptom: Interactive trust prompt appears even with --yolo flag. security.folderTrust.enabled: false does not suppress it. Impact: Entire automation pipeline blocked — --yolo mode becomes unusable without human present. Time lost: Ongoing regression.

Incident 5 — /mnt/c Path Corrupts projects.json

Symptom: Launching CLI from Windows-mounted path creates stale "/mnt/c": "c" entry causing Cloud Code API errors in all future sessions. Time lost: ~1 hour diagnosing why auth worked interactively but failed in automation.

Incident 6 — Process Deadlock / Terminal Bouncing (Ongoing)

Symptom: After heavy tool-call agent turns, CLI enters a deadlock — neither completing nor erroring. Terminal appears frozen. Only recovery: kill -9. Frequency: Approximately 1-in-4 sessions. Side effect: Dead sessions leave zombie node processes in WSL and ghost wslhost.exe bridge processes on Windows that are never reaped.

Incident 7 — WSL Fork Table Exhaustion (Live incident: 2026-04-28)

This happened today, directly caused by accumulated ghost processes from Incident 6.

After running gemini --yolo in a standard session, the next launch attempt produced:

-bash: fork: retry: Resource temporarily unavailable
-bash: fork: retry: Resource temporarily unavailable
/home/user/.npm-global/bin/gemini: fork: retry: Resource temporarily unavailable

Root cause discovered via Windows Task Manager:

  • wsl.exe ghost processes (orphaned from previous Gemini CLI deadlocks)
  • 14× wslhost.exe ghost bridge processes
  • systemd-journald in uninterruptible D state inside WSL, blocking both wsl --shutdown and wsl --terminate Ubuntu

Recovery steps required:

  1. Stop-Process -Name "wsl" -Force (Windows PowerShell, elevated)
  2. Stop-Process -Name "wslhost" -Force
  3. wsl --shutdownstill hung because vmcompute was stuck
  4. Restart-Service vmcompute -Force (required elevated PowerShell)
  5. Wait for WSL to fully restart

Total downtime: ~25 minutes of unrecoverable WSL requiring Windows-level Hyper-V service restart.


Direct Model Comparison: Gemini 2.5 Pro (via Gemini CLI) vs. Claude Sonnet 4.6 (via Claude Code CLI)

Both CLIs were run in the identical WSL2 environment on the same machine with identical shell configuration, hooks, and MCP servers.

MetricGemini CLI + Gemini 2.5 ProClaude Code CLI + Claude Sonnet 4.6
Session crashes / deadlocks~25% of sessions0 observed
EPIPE crashes after agent turnYes (Bug 3)Never observed
OAuth session dropsYes — on WSL suspend/resumeNever observed
--yolo mode blocked by interactive promptsYes (Bug 4)Never observed
MCP config corruption on server failureYes — requires manual file recoveryGraceful degradation
Recovery from WSL network interruptionHangs indefinitelyExits cleanly
Ghost processes on Windows after sessionYes — 5 wsl.exe + 14 wslhost.exe0 orphaned processes
Fork table exhaustion requiring Hyper-V restartYes (Incident 7, today)Never occurred
Task completion rate in --yolo modeSignificantly degradedConsistently high

Observation: When we switched from the Gemini 2.5 Pro model to Claude Sonnet 4.6 within Antigravity IDE (same interface, same WSL environment, same hooks), task delivery quality and completion rates improved measurably and immediately. The environment did not change. Only the model changed. This suggests the reliability gap is not purely a CLI infrastructure issue — the model's ability to handle WSL-specific tool errors, recover from ambiguous states, and complete tasks without requiring human intervention also differs significantly.


What Google Needs to Fix

  1. Process reaping: Gemini CLI MUST install SIGTERM/SIGINT/SIGCHLD handlers that explicitly kill all spawned child process groups before exit. This is a Node.js process management responsibility.
  2. --yolo mode must be interactive-prompt-free: Any interactive prompt in --yolo mode is a regression. Trust prompts, auth prompts, and confirmation dialogs must all be suppressible via config.
  3. EPIPE crash: AfterAgent hooks must run in a fully detached subprocess (setsid / new process group) to prevent inheriting the CLI's stdout during teardown.
  4. OAuth proactive refresh: Token validity should be checked at session start, not on first API call. Silent failures block automation.
  5. projects.json sanitization: Entries with /mnt/c prefix should be rejected or auto-pruned — they are never valid GCP project references.
  6. Breaking changes need migration: The hook schema change in v0.39.1 broke existing configs silently. Zod schema changes to user-facing config formats require a migration path and a clear, actionable error message.
  7. WSL2 must be a first-class supported platform: Given that WSL2 is the primary Linux development environment for Windows users, it should receive the same QA attention as native Linux. The current state — where basic operations like running a session and exiting cleanly are unreliable — is not acceptable for a tool marketed for autonomous agentic use.

Conclusion

After weeks of real-world production use, the conclusion is factual: Gemini CLI in WSL2 is not production-ready for autonomous agentic workflows. The cascade of bugs is not a single issue — it is a systemic pattern of insufficient process management, missing WSL-aware safeguards, and inadequate testing on the Windows/WSL2 platform.

We continue to use Claude Code CLI as the baseline for all critical work until these issues are resolved.

Related issues: #26111, #26114

extent analysis

TL;DR

To improve the reliability of Gemini CLI in WSL2 for autonomous agentic workflows, Google needs to address several critical issues, including process reaping, interactive prompts in --yolo mode, EPIPE crashes, OAuth proactive refresh, projects.json sanitization, and WSL2 support.

Guidance

  • Implement SIGTERM/SIGINT/SIGCHLD handlers in Gemini CLI to explicitly kill spawned child process groups before exit.
  • Modify --yolo mode to be interactive-prompt-free by making trust prompts, auth prompts, and confirmation dialogs suppressible via config.
  • Run AfterAgent hooks in a fully detached subprocess to prevent inheriting the CLI's stdout during teardown.
  • Check token validity at session start, not on first API call, to prevent silent OAuth failures.
  • Sanitize projects.json by rejecting or auto-pruning entries with /mnt/c prefix.

Example

To run AfterAgent hooks in a detached subprocess, you can use the setsid command:

setsid gemini --yolo --after-agent "/path/to/hook.sh"

Note: This is a minimal example and may require additional modifications to work with Gemini CLI.

Notes

The provided guidance is based on the issues reported and may not be an exhaustive list of all necessary fixes. Additional testing and debugging may be required to ensure the reliability of Gemini CLI in WSL2.

Recommendation

Apply the suggested workarounds and fixes to improve the reliability of Gemini CLI in WSL2, as the current state is not production-ready for autonomous agentic workflows.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING