gemini-cli - 💡(How to fix) Fix Gemini-2.5-pro hangs in interactive mode and fails with MODEL_CAPACITY_EXHAUSTED, while gemini-2.5-flash works [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
google-gemini/gemini-cli#25386Fetched 2026-04-15 06:45:10
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Timeline (top)
labeled ×3commented ×1

Error Message

When I switch to gemini-2.5-pro in the interactive CLI, it appears to stay on "Thinking..." indefinitely unless I cancel it. In debug mode, the same request fails with a 429 error indicating model capacity exhaustion. The request count increases, but no tokens are sent. It appears to be looping on the 429 error.

  • fail quickly with a clear surfaced error with actionable instructions ' "error": {\n' + error: undefined, [Symbol(gaxios-gaxios-error)]: '6.7.1'
RAW_BUFFERClick to expand / collapse

What happened?

On Windows, Gemini CLI works normally with gemini-2.5-flash, but gemini-2.5-pro does not.

When I switch to gemini-2.5-pro in the interactive CLI, it appears to stay on "Thinking..." indefinitely unless I cancel it. In debug mode, the same request fails with a 429 error indicating model capacity exhaustion. The request count increases, but no tokens are sent. It appears to be looping on the 429 error.

gemini-2.5-flash succeeds in the same session with the same authentication setup.

<img width="769" height="337" alt="Image" src="https://github.com/user-attachments/assets/8af83692-6777-47d6-a822-56c657320fb4" />

What did you expect to happen?

gemini-2.5-pro should either:

  • complete normally, or
  • fail quickly with a clear surfaced error with actionable instructions

It should not appear to think indefinitely in the interactive UI.

Client information

│ About Gemini CLI │ │ │ │ CLI Version 0.37.2 │ │ Git Commit 545e956c3 │ │ Model gemini-2.5-pro │ │ Sandbox no sandbox │ │ OS win32 │ │ Auth Method Signed in with Google ([email protected]) │ │ Tier Gemini Code Assist Standard │ │ GCP Project gen-lang-client-XXXXXX

Login information

Google Account

Anything else we need to know?

If I run : gemini -d -m gemini-2.5-pro -p "Reply with exactly: OK" I get the following: { config: { url: 'https://cloudcode-pa.googleapis.com/v1internal:streamGenerateContent?alt=sse', method: 'POST', params: { alt: 'sse' }, headers: { 'Content-Type': 'application/json', 'User-Agent': 'GeminiCLI/0.37.2/gemini-2.5-pro (win32; x64; terminal) google-api-nodejs-client/9.15.1', Authorization: '<<REDACTED> - See errorRedactoroption ingaxiosfor configuration>.', 'x-goog-api-client': 'gl-node/22.17.0' }, responseType: 'stream', body: '<<REDACTED> - SeeerrorRedactoroption ingaxiosfor configuration>.', signal: AbortSignal { aborted: false }, retry: false, paramsSerializer: [Function: paramsSerializer], validateStatus: [Function: validateStatus], errorRedactor: [Function: defaultErrorRedactor] }, response: { config: { url: 'https://cloudcode-pa.googleapis.com/v1internal:streamGenerateContent?alt=sse', method: 'POST', params: [Object], headers: [Object], responseType: 'stream', body: '<<REDACTED> - SeeerrorRedactoroption ingaxios for configuration>.', signal: [AbortSignal], retry: false, paramsSerializer: [Function: paramsSerializer], validateStatus: [Function: validateStatus], errorRedactor: [Function: defaultErrorRedactor] }, data: '[{\n' + ' "error": {\n' + ' "code": 429,\n' + ' "message": "No capacity available for model gemini-2.5-pro on the server",\n' + ' "errors": [\n' + ' {\n' + ' "message": "No capacity available for model gemini-2.5-pro on the server",\n' + ' "domain": "global",\n' + ' "reason": "rateLimitExceeded"\n' + ' }\n' + ' ],\n' + ' "status": "RESOURCE_EXHAUSTED",\n' + ' "details": [\n' + ' {\n' + ' "@type": "type.googleapis.com/google.rpc.ErrorInfo",\n' + ' "reason": "MODEL_CAPACITY_EXHAUSTED",\n' + ' "domain": "cloudcode-pa.googleapis.com",\n' + ' "metadata": {\n' + ' "model": "gemini-2.5-pro"\n' + ' }\n' + ' }\n' + ' ]\n' + ' }\n' + '}\n' + ']', headers: { 'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000', 'content-length': '606', 'content-type': 'application/json; charset=UTF-8', date: 'Tue, 14 Apr 2026 14:29:39 GMT', server: 'ESF', 'server-timing': 'gfet4t7; dur=126', vary: 'Origin, X-Origin, Referer', 'x-cloudaicompanion-trace-id': '2fa5a110a7ef80ec', 'x-content-type-options': 'nosniff', 'x-frame-options': 'SAMEORIGIN', 'x-xss-protection': '0' }, status: 429, statusText: 'Too Many Requests', request: { responseURL: 'https://cloudcode-pa.googleapis.com/v1internal:streamGenerateContent?alt=sse' } }, error: undefined, status: 429, [Symbol(gaxios-gaxios-error)]: '6.7.1' }

extent analysis

TL;DR

The issue is likely due to the gemini-2.5-pro model exceeding its capacity, resulting in a 429 error, and a workaround may involve implementing retry logic or adjusting the request rate.

Guidance

  • The error message "No capacity available for model gemini-2.5-pro on the server" suggests that the model is experiencing high demand, and the client should consider implementing exponential backoff or retry logic to handle the 429 error.
  • The fact that gemini-2.5-flash succeeds in the same session with the same authentication setup implies that the issue is specific to the gemini-2.5-pro model, and the client should investigate model-specific limitations or quotas.
  • To mitigate the issue, the client could consider reducing the request rate or splitting large requests into smaller ones to avoid exceeding the model's capacity.
  • The client should also monitor the request count and token usage to ensure that the issue is not caused by a misconfigured or inefficient request pattern.

Example

No code example is provided as the issue seems to be related to model capacity and request rate limiting, rather than a specific code implementation.

Notes

The provided information suggests that the issue is related to the gemini-2.5-pro model's capacity, but the root cause may be more complex and depend on various factors such as request patterns, model configuration, and server-side limitations. Further investigation and debugging may be necessary to determine the exact cause and implement an effective solution.

Recommendation

Apply a workaround by implementing retry logic or adjusting the request rate to avoid exceeding the model's capacity, as upgrading to a fixed version is not mentioned in the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING