gemini-cli - 💡(How to fix) Fix Gemini-2.5-pro hangs in interactive mode and fails with MODEL_CAPACITY_EXHAUSTED, while gemini-2.5-flash works [1 comments, 2 participants]

Error Message

When I switch to gemini-2.5-pro in the interactive CLI, it appears to stay on "Thinking..." indefinitely unless I cancel it. In debug mode, the same request fails with a 429 error indicating model capacity exhaustion. The request count increases, but no tokens are sent. It appears to be looping on the 429 error.

fail quickly with a clear surfaced error with actionable instructions ' "error": {\n' + error: undefined, [Symbol(gaxios-gaxios-error)]: '6.7.1'

What happened?

On Windows, Gemini CLI works normally with gemini-2.5-flash, but gemini-2.5-pro does not.

gemini-2.5-flash succeeds in the same session with the same authentication setup.

What did you expect to happen?

gemini-2.5-pro should either:

complete normally, or
fail quickly with a clear surfaced error with actionable instructions

It should not appear to think indefinitely in the interactive UI.

Client information

│ About Gemini CLI │ │ │ │ CLI Version 0.37.2 │ │ Git Commit 545e956c3 │ │ Model gemini-2.5-pro │ │ Sandbox no sandbox │ │ OS win32 │ │ Auth Method Signed in with Google ([email protected]) │ │ Tier Gemini Code Assist Standard │ │ GCP Project gen-lang-client-XXXXXX

Login information

Google Account

Anything else we need to know?

If I run : gemini -d -m gemini-2.5-pro -p "Reply with exactly: OK" I get the following: { config: { url: 'https://cloudcode-pa.googleapis.com/v1internal:streamGenerateContent?alt=sse', method: 'POST', params: { alt: 'sse' }, headers: { 'Content-Type': 'application/json', 'User-Agent': 'GeminiCLI/0.37.2/gemini-2.5-pro (win32; x64; terminal) google-api-nodejs-client/9.15.1', Authorization: '<<REDACTED> - See errorRedactoroption ingaxiosfor configuration>.', 'x-goog-api-client': 'gl-node/22.17.0' }, responseType: 'stream', body: '<<REDACTED> - SeeerrorRedactoroption ingaxiosfor configuration>.', signal: AbortSignal { aborted: false }, retry: false, paramsSerializer: [Function: paramsSerializer], validateStatus: [Function: validateStatus], errorRedactor: [Function: defaultErrorRedactor] }, response: { config: { url: 'https://cloudcode-pa.googleapis.com/v1internal:streamGenerateContent?alt=sse', method: 'POST', params: [Object], headers: [Object], responseType: 'stream', body: '<<REDACTED> - SeeerrorRedactoroption ingaxios for configuration>.', signal: [AbortSignal], retry: false, paramsSerializer: [Function: paramsSerializer], validateStatus: [Function: validateStatus], errorRedactor: [Function: defaultErrorRedactor] }, data: '[{\n' + ' "error": {\n' + ' "code": 429,\n' + ' "message": "No capacity available for model gemini-2.5-pro on the server",\n' + ' "errors": [\n' + ' {\n' + ' "message": "No capacity available for model gemini-2.5-pro on the server",\n' + ' "domain": "global",\n' + ' "reason": "rateLimitExceeded"\n' + ' }\n' + ' ],\n' + ' "status": "RESOURCE_EXHAUSTED",\n' + ' "details": [\n' + ' {\n' + ' "@type": "type.googleapis.com/google.rpc.ErrorInfo",\n' + ' "reason": "MODEL_CAPACITY_EXHAUSTED",\n' + ' "domain": "cloudcode-pa.googleapis.com",\n' + ' "metadata": {\n' + ' "model": "gemini-2.5-pro"\n' + ' }\n' + ' }\n' + ' ]\n' + ' }\n' + '}\n' + ']', headers: { 'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000', 'content-length': '606', 'content-type': 'application/json; charset=UTF-8', date: 'Tue, 14 Apr 2026 14:29:39 GMT', server: 'ESF', 'server-timing': 'gfet4t7; dur=126', vary: 'Origin, X-Origin, Referer', 'x-cloudaicompanion-trace-id': '2fa5a110a7ef80ec', 'x-content-type-options': 'nosniff', 'x-frame-options': 'SAMEORIGIN', 'x-xss-protection': '0' }, status: 429, statusText: 'Too Many Requests', request: { responseURL: 'https://cloudcode-pa.googleapis.com/v1internal:streamGenerateContent?alt=sse' } }, error: undefined, status: 429, [Symbol(gaxios-gaxios-error)]: '6.7.1' }

extent analysis

TL;DR

The issue is likely due to the gemini-2.5-pro model exceeding its capacity, resulting in a 429 error, and a workaround may involve implementing retry logic or adjusting the request rate.

Guidance

The error message "No capacity available for model gemini-2.5-pro on the server" suggests that the model is experiencing high demand, and the client should consider implementing exponential backoff or retry logic to handle the 429 error.
The fact that gemini-2.5-flash succeeds in the same session with the same authentication setup implies that the issue is specific to the gemini-2.5-pro model, and the client should investigate model-specific limitations or quotas.
To mitigate the issue, the client could consider reducing the request rate or splitting large requests into smaller ones to avoid exceeding the model's capacity.
The client should also monitor the request count and token usage to ensure that the issue is not caused by a misconfigured or inefficient request pattern.

Example

No code example is provided as the issue seems to be related to model capacity and request rate limiting, rather than a specific code implementation.

Notes

The provided information suggests that the issue is related to the gemini-2.5-pro model's capacity, but the root cause may be more complex and depend on various factors such as request patterns, model configuration, and server-side limitations. Further investigation and debugging may be necessary to determine the exact cause and implement an effective solution.

Recommendation

Apply a workaround by implementing retry logic or adjusting the request rate to avoid exceeding the model's capacity, as upgrading to a fixed version is not mentioned in the issue.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

gemini-cli - 💡(How to fix) Fix Gemini-2.5-pro hangs in interactive mode and fails with MODEL_CAPACITY_EXHAUSTED, while gemini-2.5-flash works [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

What happened?

What did you expect to happen?

Client information

Login information

Anything else we need to know?

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

gemini-cli - 💡(How to fix) Fix Gemini-2.5-pro hangs in interactive mode and fails with MODEL_CAPACITY_EXHAUSTED, while gemini-2.5-flash works [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

What happened?

What did you expect to happen?

Client information

Login information

Anything else we need to know?

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING