gemini-cli - ✅(Solved) Fix CRITICAL: Agent silent hang for 1+ hour on 429 Rate Limit (Ultra Subscriber Impact) [1 pull requests, 4 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
google-gemini/gemini-cli#25736Fetched 2026-04-22 08:03:52
View on GitHub
Comments
4
Participants
4
Timeline
8
Reactions
0
Timeline (top)
commented ×4cross-referenced ×1labeled ×1mentioned ×1

The Gemini CLI (v0.38.2) agent enters a silent, unresponsive 'Thinking' state for approximately 1 hour when hitting a 429 (Too Many Requests) rate limit from the cloudcode-pa.googleapis.com endpoint. This is a severe failure in error handling, backoff logic, and user communication.

Error Message

The user's local JSON log confirms the following status: ```json { "status": 429, "statusText": "Too Many Requests", "request": { "responseURL": "https://cloudcode-pa.googleapis.com/v1internal:streamGenerateContent?alt=sse" } } ``` The agent was active for only 8 minutes of API time over a wall time of 55 minutes, most of which was spent in this unobservable 'thinking' state.

Root Cause

The Gemini CLI (v0.38.2) agent enters a silent, unresponsive 'Thinking' state for approximately 1 hour when hitting a 429 (Too Many Requests) rate limit from the cloudcode-pa.googleapis.com endpoint. This is a severe failure in error handling, backoff logic, and user communication.

Fix Action

Fixed

PR fix notes

PR #25684: fix(config): use flash-lite for utility model configs to preserve quota

Description (problem / solution / changelog)

Fixes #23397 Fixes #18059 Fixes #25736 Fixes #23986 Fixes #24039 Fixes #22141 Related to #24937 (capacity/429 tracking issue) Mitigates #23738 (infinite rate-limit loops)

Problem

  1. Utility Quota Leak: When gemini-3-flash-preview quota is exhausted (100%), the CLI becomes completely unusable even if the user explicitly switches to gemini-3.1-flash-lite-preview or Pro. This is because six internal utility configs (loop detection, edit fixer, next speaker, etc.) were hardcoded to Flash. This often results in the agent "hanging" in a thinking loop for over an hour (#22141, #25736).
  2. Auto-mode Fallback Failure: In both Auto (Gemini 3) and Auto (Gemini 2.5) modes, the policy chain was not self-contained. When a primary model hit a quota limit, the CLI would stop and prompt the user with a "Select Model" dialog instead of silently falling back to available models.

Fixes

1. Internal Utility Migration

Switched six utility configs to use a new gemini-3-flash-lite-base targeting gemini-3.1-flash-lite-preview. This preserves Flash quota and prevents "silent hangs" by ensuring background tasks run on a model with independent capacity.

Config keyRoleBeforeAfter
loop-detectionUTILITY_LOOP_DETECTORFlashFlash Lite ✓
llm-edit-fixerUTILITY_EDIT_CORRECTORFlashFlash Lite ✓
next-speaker-checkerUTILITY_NEXT_SPEAKERFlashFlash Lite ✓
web-fetch-fallbackfallback path (no tools)FlashFlash Lite ✓
web-searchGrounding with Google SearchFlashFlash Lite ✓
web-fetchURL context toolFlashFlash Lite ✓

2. Auto-mode Resilience

  • Improved resolvePolicyChain to ensure Auto modes (3 and 2.5) always use a circular fallback loop (wrapsAround).
  • Added Flash Lite as the new last-resort for Auto chains.
  • This ensures that if Flash is exhausted, Auto mode silently continues on Flash Lite or Pro without user intervention.

Impact

  • Fixes "silent hangs" (#22141, #25736) and 429-related workflow interruptions for both manual and auto users.
  • Reduces Flash consumption for all utility calls.
  • Mitigates infinite rate-limit loops (#23738) by providing alternative model paths.
  • Verified locally with updated tests in policyHelpers.test.ts.

Reproduction (from #23397)

# Set Flash Lite as main model explicitly:
gemini -m 'gemini-3.1-flash-lite-preview'

# Still gets:
Usage limit reached for gemini-3-flash-preview.
/model to switch models.

Changed files

  • packages/core/src/availability/policyHelpers.ts (modified, +68/-25)
RAW_BUFFERClick to expand / collapse

Summary

The Gemini CLI (v0.38.2) agent enters a silent, unresponsive 'Thinking' state for approximately 1 hour when hitting a 429 (Too Many Requests) rate limit from the cloudcode-pa.googleapis.com endpoint. This is a severe failure in error handling, backoff logic, and user communication.

User Impact

The user is an Ultra Subscriber with minimal usage, yet is being subjected to rate limits that cripple productivity. Instead of failing fast or notifying the user of the throttle, the agent simply 'hangs' in a thinking loop while internally retrying. This results in a complete loss of session control for the operator.

Technical Critique

The current behavior represents a critical architectural failure in error-state communication. It is unacceptable for a premium service to leave a user in an unobservable 'black box' state for an hour without any indication of a rate-limit throttle or an estimated time for recovery. The retry-backoff logic for 429 errors is fundamentally broken and must be updated to provide immediate user-facing feedback or a graceful fail-fast mechanism.

Technical Details

Logs

The user's local JSON log confirms the following status: ```json { "status": 429, "statusText": "Too Many Requests", "request": { "responseURL": "https://cloudcode-pa.googleapis.com/v1internal:streamGenerateContent?alt=sse" } } ``` The agent was active for only 8 minutes of API time over a wall time of 55 minutes, most of which was spent in this unobservable 'thinking' state.

extent analysis

TL;DR

The Gemini CLI agent's retry-backoff logic for 429 errors needs to be updated to provide immediate user-facing feedback or a graceful fail-fast mechanism.

Guidance

  • Review the current retry-backoff logic implementation to identify why it's causing the agent to hang for an hour without notifying the user.
  • Consider implementing a fail-fast mechanism that notifies the user of the rate limit throttle and estimated time for recovery.
  • Update the error handling to provide immediate user-facing feedback when a 429 error occurs, such as displaying an error message or notification.
  • Investigate why the user is being rate-limited despite having minimal usage as an Ultra Subscriber.

Example

No code snippet is provided as the issue does not contain sufficient information about the implementation details.

Notes

The issue seems to be related to the specific implementation of the retry-backoff logic and error handling in the Gemini CLI agent. Without more information about the implementation, it's difficult to provide a more detailed solution.

Recommendation

Apply a workaround by implementing a custom retry-backoff logic that provides immediate user-facing feedback or a fail-fast mechanism, as the current implementation is severely impacting user productivity.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

gemini-cli - ✅(Solved) Fix CRITICAL: Agent silent hang for 1+ hour on 429 Rate Limit (Ultra Subscriber Impact) [1 pull requests, 4 comments, 4 participants]