gemini-cli - ✅(Solved) Fix CRITICAL: Agent silent hang for 1+ hour on 429 Rate Limit (Ultra Subscriber Impact) [1 pull requests, 4 comments, 4 participants]

BrianV1981 · 2026-04-21T06:38:06Z

[gemini-cli] The Gemini CLI v0.38.2 agent enters a silent, unresponsive 'Thinking' state for approximately 1 hour when hitting a 429 Too Many Requests rate lim… The Gemini CLI (v0.38.2) agent enters a silent, unresponsive 'Thinking' state for approximately 1 hour when hitting a 429 (Too Many Requests) rate limit from the `cloudcode-pa.googleapis.com` endpoint. This is a severe failure in error handling, backoff logic, and user communication. # PR #25684: fix(config): use flash-lite for utility model configs to preserve quota - Repository: google-gemini/gemini-cli - Author: kazukinakai - State: open | merged: False - Link: https://github.com/google-gemini/gemini-cli/pull/25684 ## Description (problem / solution / changelog) Fixes #23397 Fixes #18059 Fixes #25736 Fixes #23986 Fixes #24039 Fixes #22141 Related to #24937 (capacity/429 tracking issue) Mitigates #23738 (infinite rate-limit loops) ## Problem 1. **Utility Quota Leak:** When `gemini-3-flash-preview` quota is exhausted (100%), the CLI becomes completely unusable even if the user explicitly switches to `gemini-3.1-flash-lite-preview` or Pro. This is because six internal utility configs (loop detection, edit fixer, next speaker, etc.) were hardcoded to Flash. This often results in the agent "hanging" in a thinking loop for over an hour (#22141, #25736). 2. **Auto-mode Fallback Failure:** In both Auto (Gemini 3) and Auto (Gemini 2.5) modes, the policy chain was not self-contained. When a primary model hit a quota limit, the CLI would stop and prompt the user with a "Select Model" dialog instead of silently falling back to available models. ## Fixes ### 1. Internal Utility Migration Switched six utility configs to use a new `gemini-3-flash-lite-base` targeting `gemini-3.1-flash-lite-preview`. This preserves Flash quota and prevents "silent hangs" by ensuring background tasks run on a model with independent capacity. | Config key | Role | Before | After | |---|---|---|---| | `loop-detection` | `UTILITY_LOOP_DETECTOR` | Flash | Flash Lite ✓ | | `llm-edit-fixer` | `UTILITY_EDIT_CORRECTOR` | Flash | Flash Lite ✓ | | `next-speaker-checker` | `UTILITY_NEXT_SPEAKER` | Flash | Flash Lite ✓ | | `web-fetch-fallback` | fallback path (no tools) | Flash | Flash Lite ✓ | | `web-search` | Grounding with Google Search | Flash | Flash Lite ✓ | | `web-fetch` | URL context tool | Flash | Flash Lite ✓ | ### 2. Auto-mode Resilience - Improved `resolvePolicyChain` to ensure Auto modes (3 and 2.5) always use a circular fallback loop (`wrapsAround`). - Added Flash Lite as the new last-resort for Auto chains. - This ensures that if Flash is exhausted, Auto mode silently continues on Flash Lite or Pro without user intervention. ## Impact - Fixes "silent hangs" (#22141, #25736) and 429-related workflow interruptions for both manual and auto users. - Reduces Flash consumption for all utility calls. - Mitigates infinite rate-limit loops (#23738) by providing alternative model paths. - Verified locally with updated tests in `policyHelpers.test.ts`. ## Reproduction (from #23397) ``` # Set Flash Lite as main model explicitly: gemini -m 'gemini-3.1-flash-lite-preview' # Still gets: Usage limit reached for gemini-3-flash-preview. /model to switch models. ``` ## Changed files - `packages/core/src/availability/policyHelpers.ts` (modified, +68/-25) ## Fixed - Fixed by PR: fix(config): use flash-lite for utility model configs to preserve quota (https://github.com/google-gemini/gemini-cli/pull/25684) ### Summary The Gemini CLI (v0.38.2) agent enters a silent, unresponsive 'Thinking' state for approximately 1 hour when hitting a 429 (Too Many Requests) rate limit from the `cloudcode-pa.googleapis.com` endpoint. This is a severe failure in error handling, backoff logic, and user communication. ### User Impact The user is an **Ultra Subscriber** with minimal usage, yet is being subjected to rate limits that cripple productivity. Instead of failing fast or notifying the user of the throttle, the agent simply 'hangs' in a thinking loop while internally retrying. This results in a complete loss of session control for the operator. ### Technical Critique The current behavior represents a critical architectural failure in error-state communication. It is unacceptable for a premium service to leave a user in an unobservable 'black box' state for an hour without any indication of a rate-limit throttle or an estimated time for recovery. The retry-backoff logic for 429 errors is fundamentally broken and must be updated to provide immediate user-facing feedback or a graceful fail-fast mechanism. ### Technical Details - **CLI Version:** 0.38.2 - **Model:** gemini-3.1-pro-preview - **Endpoint:** https://cloudcode-pa.googleapis.com/v1internal:streamGenerateContent?alt=sse - **Error Code:** 429 (Too Many Requests) - **Session ID:** c1e9d1a5-8c04-46b5-8967-b2ee62d0db85 - **Failure Mode:** Silent thinking loop for ~50-60 minutes. - **Context Usage at time of failure:** Only 3% (approx 60k tokens of 2M window).

gemini-cli2026-04-21 06:38:06

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

google-gemini/gemini-cli#25736•Fetched 2026-04-22 08:03:52

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×4cross-referenced ×1labeled ×1mentioned ×1

The Gemini CLI (v0.38.2) agent enters a silent, unresponsive 'Thinking' state for approximately 1 hour when hitting a 429 (Too Many Requests) rate limit from the cloudcode-pa.googleapis.com endpoint. This is a severe failure in error handling, backoff logic, and user communication.

Error Message

The user's local JSON log confirms the following status: ```json { "status": 429, "statusText": "Too Many Requests", "request": { "responseURL": "https://cloudcode-pa.googleapis.com/v1internal:streamGenerateContent?alt=sse" } } ``` The agent was active for only 8 minutes of API time over a wall time of 55 minutes, most of which was spent in this unobservable 'thinking' state.

Root Cause

Fix Action

Fixed

Fixed by PR: fix(config): use flash-lite for utility model configs to preserve quota (https://github.com/google-gemini/gemini-cli/pull/25684)

PR fix notes

PR #25684: fix(config): use flash-lite for utility model configs to preserve quota

Repository: google-gemini/gemini-cli
Author: kazukinakai
State: open | merged: False
Link: https://github.com/google-gemini/gemini-cli/pull/25684

Description (problem / solution / changelog)

Fixes #23397 Fixes #18059 Fixes #25736 Fixes #23986 Fixes #24039 Fixes #22141 Related to #24937 (capacity/429 tracking issue) Mitigates #23738 (infinite rate-limit loops)

Problem

Utility Quota Leak: When gemini-3-flash-preview quota is exhausted (100%), the CLI becomes completely unusable even if the user explicitly switches to gemini-3.1-flash-lite-preview or Pro. This is because six internal utility configs (loop detection, edit fixer, next speaker, etc.) were hardcoded to Flash. This often results in the agent "hanging" in a thinking loop for over an hour (#22141, #25736).
Auto-mode Fallback Failure: In both Auto (Gemini 3) and Auto (Gemini 2.5) modes, the policy chain was not self-contained. When a primary model hit a quota limit, the CLI would stop and prompt the user with a "Select Model" dialog instead of silently falling back to available models.

Fixes

1. Internal Utility Migration

Switched six utility configs to use a new gemini-3-flash-lite-base targeting gemini-3.1-flash-lite-preview. This preserves Flash quota and prevents "silent hangs" by ensuring background tasks run on a model with independent capacity.

Config key	Role	Before	After
`loop-detection`	`UTILITY_LOOP_DETECTOR`	Flash	Flash Lite ✓
`llm-edit-fixer`	`UTILITY_EDIT_CORRECTOR`	Flash	Flash Lite ✓
`next-speaker-checker`	`UTILITY_NEXT_SPEAKER`	Flash	Flash Lite ✓
`web-fetch-fallback`	fallback path (no tools)	Flash	Flash Lite ✓
`web-search`	Grounding with Google Search	Flash	Flash Lite ✓
`web-fetch`	URL context tool	Flash	Flash Lite ✓

2. Auto-mode Resilience

Improved resolvePolicyChain to ensure Auto modes (3 and 2.5) always use a circular fallback loop (wrapsAround).
Added Flash Lite as the new last-resort for Auto chains.
This ensures that if Flash is exhausted, Auto mode silently continues on Flash Lite or Pro without user intervention.

Impact

Fixes "silent hangs" (#22141, #25736) and 429-related workflow interruptions for both manual and auto users.
Reduces Flash consumption for all utility calls.
Mitigates infinite rate-limit loops (#23738) by providing alternative model paths.
Verified locally with updated tests in policyHelpers.test.ts.

Reproduction (from #23397)

# Set Flash Lite as main model explicitly:
gemini -m 'gemini-3.1-flash-lite-preview'

# Still gets:
Usage limit reached for gemini-3-flash-preview.
/model to switch models.

Changed files

packages/core/src/availability/policyHelpers.ts (modified, +68/-25)

RAW_BUFFERClick to expand / collapse

Summary

User Impact

The user is an Ultra Subscriber with minimal usage, yet is being subjected to rate limits that cripple productivity. Instead of failing fast or notifying the user of the throttle, the agent simply 'hangs' in a thinking loop while internally retrying. This results in a complete loss of session control for the operator.

Technical Critique

The current behavior represents a critical architectural failure in error-state communication. It is unacceptable for a premium service to leave a user in an unobservable 'black box' state for an hour without any indication of a rate-limit throttle or an estimated time for recovery. The retry-backoff logic for 429 errors is fundamentally broken and must be updated to provide immediate user-facing feedback or a graceful fail-fast mechanism.

Technical Details

CLI Version: 0.38.2
Model: gemini-3.1-pro-preview
Endpoint: https://cloudcode-pa.googleapis.com/v1internal:streamGenerateContent?alt=sse
Error Code: 429 (Too Many Requests)
Session ID: c1e9d1a5-8c04-46b5-8967-b2ee62d0db85
Failure Mode: Silent thinking loop for ~50-60 minutes.
Context Usage at time of failure: Only 3% (approx 60k tokens of 2M window).

Logs

extent analysis

TL;DR

The Gemini CLI agent's retry-backoff logic for 429 errors needs to be updated to provide immediate user-facing feedback or a graceful fail-fast mechanism.

Guidance

Review the current retry-backoff logic implementation to identify why it's causing the agent to hang for an hour without notifying the user.
Consider implementing a fail-fast mechanism that notifies the user of the rate limit throttle and estimated time for recovery.
Update the error handling to provide immediate user-facing feedback when a 429 error occurs, such as displaying an error message or notification.
Investigate why the user is being rate-limited despite having minimal usage as an Ultra Subscriber.

Example

No code snippet is provided as the issue does not contain sufficient information about the implementation details.

Notes

The issue seems to be related to the specific implementation of the retry-backoff logic and error handling in the Gemini CLI agent. Without more information about the implementation, it's difficult to provide a more detailed solution.

Recommendation

Apply a workaround by implementing a custom retry-backoff logic that provides immediate user-facing feedback or a fail-fast mechanism, as the current implementation is severely impacting user productivity.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #LLM response #prompt template #agent execution #callback error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.