ollama - 💡(How to fix) Fix gemma4:e4b only offloads 2.8 GiB to ROCm GPU despite 7.5 GiB available [3 comments, 2 participants]

ollama2026-04-22 16:11:49

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#15749•Fetched 2026-04-23 07:23:24

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Itstommy10

Participants

Itstommy10

NBAB42Bq

Timeline (top)

commented ×3labeled ×1mentioned ×1subscribed ×1

Code Example

gpu memory id=0 library=ROCm available="7.5 GiB" free="8.0 GiB" minimum="457.0 MiB" overhead="0 B"
model weights device=ROCm0 size="2.8 GiB"
model weights device=CPU   size="6.6 GiB"
offloaded 42/43 layers to GPU

RAW_BUFFERClick to expand / collapse

What is the issue?

GPU: AMD Radeon RX 6600 XT (gfx1032, 8GB VRAM) OS: Linux Ollama: 0.21.0 Backend: ROCm

Issue: Ollama reports 7.5 GiB available on ROCm0 but only offloads 2.8 GiB of model weights to GPU, putting 6.6 GiB on CPU.

ollama ps shows: 68% CPU / 32% GPU

Logs: gpu memory id=0 library=ROCm available="7.5 GiB" free="8.0 GiB" minimum="457.0 MiB" overhead="0 B" model weights device=ROCm0 size="2.8 GiB" model weights device=CPU size="6.6 GiB" offloaded 42/43 layers to GPU

Tried:

OLLAMA_NUM_GPU=999
OLLAMA_GPU_OVERHEAD=0
OLLAMA_FLASH_ATTENTION=0
HSA_OVERRIDE_GFX_VERSION=10.3.0 None changed the weights distribution.

Note: Other models (qwen3.5:9b, qwen3-vl) work fine on this GPU.

Relevant log output

gpu memory id=0 library=ROCm available="7.5 GiB" free="8.0 GiB" minimum="457.0 MiB" overhead="0 B"
model weights device=ROCm0 size="2.8 GiB"
model weights device=CPU   size="6.6 GiB"
offloaded 42/43 layers to GPU

OS

Linux

GPU

AMD

CPU

AMD

Ollama version

0.21.0

extent analysis

TL;DR

The issue might be resolved by investigating and adjusting the memory allocation settings for the Ollama model, as the current configuration only offloads a fraction of the model weights to the GPU.

Guidance

Review the Ollama documentation to understand how memory allocation is handled, especially for large models, to see if there are specific settings or configurations that can be adjusted to optimize GPU usage.
Investigate the difference in behavior between the problematic model and other models (like qwen3.5:9b, qwen3-vl) that work fine on this GPU, focusing on model size, complexity, and any specific settings used for those models.
Consider experimenting with different values for OLLAMA_NUM_GPU and other environment variables related to GPU usage, even though the provided attempts did not yield results, as there might be other combinations or settings that could influence the memory allocation.
Look into the ROCm library settings and version (currently using version implied by HSA_OVERRIDE_GFX_VERSION=10.3.0) to ensure compatibility and optimal performance with the AMD Radeon RX 6600 XT.

Example

No specific code example can be provided without more details on the model or the exact commands used to run Ollama. However, reviewing the commands or scripts that load and run the models, especially focusing on any parameters related to GPU allocation, could provide insights.

Notes

The issue seems specific to the combination of the model being used and the Ollama version (0.21.0) on the specified hardware. The fact that other models work correctly suggests a potential issue with how this particular model interacts with Ollama's memory management.

Recommendation

Apply workaround: Given the model-specific nature of the issue and the lack of success with environment variable adjustments, the next step would involve deeper investigation into model-specific settings or optimizations that could improve GPU utilization, rather than upgrading Ollama, as there's no clear indication a newer version would resolve this specific issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#authentication issue #prompt issue #agent setup #task chaining #parallel task

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix gemma4:e4b only offloads 2.8 GiB to ROCm GPU despite 7.5 GiB available [3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix gemma4:e4b only offloads 2.8 GiB to ROCm GPU despite 7.5 GiB available [3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING