gemini-cli - 💡(How to fix) Fix Bug: Local Gemma model (gemma3-1b-gpu-custom) hangs and fails on NVIDIA GTX 1050 (Linux)

gemini-cli2026-05-08 00:16:10

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

Checking gemma.log shows the GPU is successfully detected and initialized (Selected adapter: NVIDIA GeForce GTX 1050), but the server throws a Request cancelled by client: context canceled error almost immediately.

Code Example

CLI Version                                         0.41.2                                                                                           │
│ Git Commit                                          b0c7a1722                                                                                        │
│ Model                                               Auto (Gemini 3)                                                                                  │
│ Sandbox                                             no sandbox                                                                                       │
│ OS                                                  linux                                                                                            │
│ Auth Method                                         Signed in with Google (enricferrera@gmail.com)                                                   │
│ Tier                                                Gemini Code Assist in Google One AI Pro

RAW_BUFFERClick to expand / collapse

What happened?

When attempting to use the local Gemma model (gemma3-1b-gpu-custom) on an NVIDIA GTX 1050 running Ubuntu 24.04 LTS, the gemini ask command hangs for a few seconds and then silently falls back to the cloud model.

I verified that the GPU can run the model by bypassing the CLI's internal timeout. When I send a direct payload using: echo '{"contents": [{"role": "user", "parts": [{"text": "Write a short poem about a laptop."}]}]}' > request.json curl -d @request.json -H "Content-Type: application/json" http://localhost:9379/v1beta/models/gemma3-1b-gpu-custom:generateContent ...the server successfully generates the response, but it takes an exceptionally long time. The CLI appears to be timing out during the model's prefill phase due to the slower hardware.

What did you expect to happen?

I expected the CLI to wait for the local GPU to finish generating the response instead of immediately timing out and falling back to the cloud. Ideally, there should be a configurable timeout setting in settings.json for the local model router to accommodate older/slower hardware

Client information

<details> <summary>Client Information</summary>

Run gemini to enter the interactive CLI, then run the /about command.

CLI Version                                         0.41.2                                                                                           │
│ Git Commit                                          b0c7a1722                                                                                        │
│ Model                                               Auto (Gemini 3)                                                                                  │
│ Sandbox                                             no sandbox                                                                                       │
│ OS                                                  linux                                                                                            │
│ Auth Method                                         Signed in with Google ([email protected])                                                   │
│ Tier                                                Gemini Code Assist in Google One AI Pro

</details>

Login information

Not relevant

Anything else we need to know?

Logs showing the crash from the CLI's hardcoded timeout:

1 I0000 00:00:1778196558.764659 4732 environment.cc:554] Selected adapter: NVIDIA GeForce GTX 1050, arch=pascal, vendor=nvidia, backend=Vulkan, adapterType=Discrete GPU 2 2026/05/08 01:29:21 Engine initialized 3 2026/05/08 01:29:22 Request cancelled by client: context canceled 4 E0000 00:00:1778196756.215127 4711 process_state.cc:771] RAW: Raising signal 15 with default behavior It seems the VRAM limits or Pascal architecture compute speed is just too slow for the CLI's default "impatience" threshold.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

gemini-cli - 💡(How to fix) Fix Bug: Local Gemma model (gemma3-1b-gpu-custom) hangs and fails on NVIDIA GTX 1050 (Linux)

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

What happened?

What did you expect to happen?

Client information

Login information

Anything else we need to know?

Still need to ship something?

TRENDING

gemini-cli - 💡(How to fix) Fix Bug: Local Gemma model (gemma3-1b-gpu-custom) hangs and fails on NVIDIA GTX 1050 (Linux)

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

What happened?

What did you expect to happen?

Client information

Login information

Anything else we need to know?

Still need to ship something?

RELATED_DISCOVERY

TRENDING