ollama - 💡(How to fix) Fix Vulkan runtime, allow more than 64GB VRAM with splitted memory heaps. [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15302Fetched 2026-04-08 02:44:26
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
labeled ×1

Code Example

This is what is detected by Ollama Vulkan runtime

---

When filling the larger heap (above 54.7 GB in this example, Qwen3.5 27B bf16) Ollama does not crash, just gets indefinitely stuck loading the model.
RAW_BUFFERClick to expand / collapse

On Linux AMDGPU, when assigned more than 64 GB GTT memory, Vulkan split the memory heap. Ollama counts the assigned GTT memory pool, but gets stuck when using above the larger splitted memory heap.

In my current setup, a GFX900-level AMD APU, with 96GB of main RAM, 82GB assigned (amdgpu.gttsize=82000)

These are the reported memory heaps by vulkaninfo

=================================
memoryHeaps: count = 2
        memoryHeaps[0]:
                size   = 29376905216 (0x6d7000000) (27.36 GiB)
                budget = 29367087104 (0x6d66a3000) (27.35 GiB)
                usage  = 0 (0x00000000) (0.00 B)
                flags:
                        None
        memoryHeaps[1]:
                size   = 58753810432 (0xdae000000) (54.72 GiB)
                budget = 58734174208 (0xdacd46000) (54.70 GiB)
                usage  = 0 (0x00000000) (0.00 B)
                flags: count = 1
                        MEMORY_HEAP_DEVICE_LOCAL_BIT

This is what is detected by Ollama Vulkan runtime

time=2026-04-03T19:23:01.730Z level=INFO source=types.go:42 msg="inference compute" id=00000000-0900-0000-0000-000000000000 filter_id="" library=Vulkan compute=0.0 name=Vulkan1 description="AMD Radeon Vega 8 Graphics (RADV RAVEN)" libdirs=ollama,vulkan driver=0.0 pci_id=0000:09:00.0 type=iGPU total="82.1 GiB" available="82.1 GiB"

When filling the larger heap (above 54.7 GB in this example, Qwen3.5 27B bf16) Ollama does not crash, just gets indefinitely stuck loading the model.

time=2026-04-03T19:05:34.815Z level=INFO source=device.go:240 msg="model weights" device=Vulkan1 size="49.3 GiB"
time=2026-04-03T19:05:34.815Z level=INFO source=device.go:245 msg="model weights" device=CPU size="2.5 GiB"
time=2026-04-03T19:05:34.815Z level=INFO source=device.go:251 msg="kv cache" device=Vulkan1 size="4.7 GiB"
time=2026-04-03T19:05:34.832Z level=INFO source=device.go:262 msg="compute graph" device=Vulkan1 size="789.3 MiB"
time=2026-04-03T19:05:34.832Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="39.6 MiB"
time=2026-04-03T19:05:34.833Z level=INFO source=device.go:272 msg="total memory" size="57.2 GiB"

extent analysis

TL;DR

  • The issue can be mitigated by ensuring Ollama correctly handles the split memory heap reported by Vulkan, potentially by accounting for the device-local memory heap.

Guidance

  • Review the Vulkan documentation to understand how to properly handle split memory heaps, especially when the device-local heap is involved.
  • Verify that Ollama's memory allocation logic is compatible with the Vulkan memory model, particularly for devices with multiple memory heaps.
  • Consider implementing a check to detect when the allocated memory exceeds the size of the first memory heap (27.36 GiB in this case), and adjust the allocation strategy accordingly to utilize the device-local heap.
  • Investigate if there are any Vulkan extensions or features that can help manage or merge the memory heaps, potentially simplifying the allocation process.

Example

  • No specific code example can be provided without more context on Ollama's implementation, but ensuring that memory allocations are made with awareness of the MEMORY_HEAP_DEVICE_LOCAL_BIT flag may be crucial.

Notes

  • The exact solution may depend on the specifics of Ollama's Vulkan integration and how it manages memory allocations across different heaps.
  • Understanding the implications of the MEMORY_HEAP_DEVICE_LOCAL_BIT flag on memory allocation and access patterns is essential for a proper fix.

Recommendation

  • Apply workaround: Adjust Ollama's memory allocation logic to correctly handle the split memory heap scenario, ensuring it can utilize the device-local memory heap efficiently. This approach is recommended because it directly addresses the observed issue without requiring external updates or upgrades.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING