ollama - 💡(How to fix) Fix GGML_ASSERT(i01 >= 0 && i01 < ne01) failed [3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15174Fetched 2026-04-08 01:58:20
View on GitHub
Comments
3
Participants
2
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
commented ×3

Error Message

https://github.com/nearai/ironclaw/issues/1827 is this error caused by Ollama or by ironclaw? time=2026-03-31T16:06:37.493+02:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 1"

Root Cause

https://github.com/nearai/ironclaw/issues/1827 is this error caused by Ollama or by ironclaw?

Code Example

ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 3050 Ti Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
ggml_vulkan: 1 = Intel(R) Iris(R) Xe Graphics (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from C:\Users\user\AppData\Local\Programs\Ollama\lib\ollama\vulkan\ggml-vulkan.dll
time=2026-03-31T16:06:28.272+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
ggml_backend_vk_get_device_memory called: uuid a4f6355b-902f-14e3-2b28-2189eb9ad638
ggml_backend_vk_get_device_memory called: luid 0x0000000000013583
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: NVIDIA GeForce RTX 3050 Ti Laptop GPU, LUID: 0x0000000000013583, Dedicated: 3.87 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Intel(R) Iris(R) Xe Graphics, LUID: 0x0000000000013232, Dedicated: 0.12 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x0000000000013510, Dedicated: 0.00 GB, Shared: 7.85 GB
Discrete GPU (NVIDIA GeForce RTX 3050 Ti Laptop GPU) with LUID 0x0000000000013583 detected. Dedicated Total: 4157603840.00 bytes (3.87 GB), Dedicated Usage: 161857536.00 bytes (0.15 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 3995746304 total: 4157603840
ggml_backend_vk_get_device_memory called: uuid 8680a646-0c00-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000013232
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: NVIDIA GeForce RTX 3050 Ti Laptop GPU, LUID: 0x0000000000013583, Dedicated: 3.87 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Intel(R) Iris(R) Xe Graphics, LUID: 0x0000000000013232, Dedicated: 0.12 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x0000000000013510, Dedicated: 0.00 GB, Shared: 7.85 GB
Integrated GPU (Intel(R) Iris(R) Xe Graphics) with LUID 0x0000000000013232 detected. Shared Total: 8433055744.00 bytes (7.85 GB), Shared Usage: 3236061184.00 bytes (3.01 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 5331212288 total: 8567273472
time=2026-03-31T16:06:29.943+02:00 level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:512 KvCacheType: NumThreads:6 GPULayers:13[ID:GPU-a4f6355b-902f-14e3-2b28-2189eb9ad638 Layers:13(0..12)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid a4f6355b-902f-14e3-2b28-2189eb9ad638
ggml_backend_vk_get_device_memory called: luid 0x0000000000013583
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: NVIDIA GeForce RTX 3050 Ti Laptop GPU, LUID: 0x0000000000013583, Dedicated: 3.87 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Intel(R) Iris(R) Xe Graphics, LUID: 0x0000000000013232, Dedicated: 0.12 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x0000000000013510, Dedicated: 0.00 GB, Shared: 7.85 GB
Discrete GPU (NVIDIA GeForce RTX 3050 Ti Laptop GPU) with LUID 0x0000000000013583 detected. Dedicated Total: 4157603840.00 bytes (3.87 GB), Dedicated Usage: 176545792.00 bytes (0.16 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 3981058048 total: 4157603840
ggml_backend_vk_get_device_memory called: uuid 8680a646-0c00-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000013232
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: NVIDIA GeForce RTX 3050 Ti Laptop GPU, LUID: 0x0000000000013583, Dedicated: 3.87 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Intel(R) Iris(R) Xe Graphics, LUID: 0x0000000000013232, Dedicated: 0.12 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x0000000000013510, Dedicated: 0.00 GB, Shared: 7.85 GB
Integrated GPU (Intel(R) Iris(R) Xe Graphics) with LUID 0x0000000000013232 detected. Shared Total: 8433055744.00 bytes (7.85 GB), Shared Usage: 3225890816.00 bytes (3.00 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 5341382656 total: 8567273472
time=2026-03-31T16:06:30.824+02:00 level=INFO source=runner.go:1284 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:512 KvCacheType: NumThreads:6 GPULayers:13[ID:GPU-a4f6355b-902f-14e3-2b28-2189eb9ad638 Layers:13(0..12)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-31T16:06:30.824+02:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="906.8 MiB"
time=2026-03-31T16:06:30.825+02:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="366.3 MiB"
time=2026-03-31T16:06:30.825+02:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="28.0 MiB"
time=2026-03-31T16:06:30.825+02:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="1.5 MiB"
time=2026-03-31T16:06:30.825+02:00 level=INFO source=device.go:272 msg="total memory" size="1.3 GiB"
time=2026-03-31T16:06:30.825+02:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-03-31T16:06:30.824+02:00 level=INFO source=ggml.go:482 msg="offloading 12 repeating layers to GPU"
time=2026-03-31T16:06:30.840+02:00 level=INFO source=ggml.go:489 msg="offloading output layer to GPU"
time=2026-03-31T16:06:30.840+02:00 level=INFO source=ggml.go:494 msg="offloaded 13/13 layers to GPU"
time=2026-03-31T16:06:30.839+02:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
time=2026-03-31T16:06:30.853+02:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model"
time=2026-03-31T16:06:36.400+02:00 level=INFO source=server.go:1390 msg="llama runner started in 18.25 seconds"
C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cpu/ops.cpp:4666: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed
[GIN] 2026/03/31 - 16:06:37 | 400 |   32.0813551s |       127.0.0.1 | POST     "/api/embed"
time=2026-03-31T16:06:37.493+02:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 1"
RAW_BUFFERClick to expand / collapse

What is the issue?

https://github.com/nearai/ironclaw/issues/1827 is this error caused by Ollama or by ironclaw?

Relevant log output

ggml_vulkan: Found 2 Vulkan devices:
ggml_vulkan: 0 = NVIDIA GeForce RTX 3050 Ti Laptop GPU (NVIDIA) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 32 | shared memory: 49152 | int dot: 1 | matrix cores: NV_coopmat2
ggml_vulkan: 1 = Intel(R) Iris(R) Xe Graphics (Intel Corporation) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 32 | shared memory: 32768 | int dot: 1 | matrix cores: none
load_backend: loaded Vulkan backend from C:\Users\user\AppData\Local\Programs\Ollama\lib\ollama\vulkan\ggml-vulkan.dll
time=2026-03-31T16:06:28.272+02:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
ggml_backend_vk_get_device_memory called: uuid a4f6355b-902f-14e3-2b28-2189eb9ad638
ggml_backend_vk_get_device_memory called: luid 0x0000000000013583
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: NVIDIA GeForce RTX 3050 Ti Laptop GPU, LUID: 0x0000000000013583, Dedicated: 3.87 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Intel(R) Iris(R) Xe Graphics, LUID: 0x0000000000013232, Dedicated: 0.12 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x0000000000013510, Dedicated: 0.00 GB, Shared: 7.85 GB
Discrete GPU (NVIDIA GeForce RTX 3050 Ti Laptop GPU) with LUID 0x0000000000013583 detected. Dedicated Total: 4157603840.00 bytes (3.87 GB), Dedicated Usage: 161857536.00 bytes (0.15 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 3995746304 total: 4157603840
ggml_backend_vk_get_device_memory called: uuid 8680a646-0c00-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000013232
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: NVIDIA GeForce RTX 3050 Ti Laptop GPU, LUID: 0x0000000000013583, Dedicated: 3.87 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Intel(R) Iris(R) Xe Graphics, LUID: 0x0000000000013232, Dedicated: 0.12 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x0000000000013510, Dedicated: 0.00 GB, Shared: 7.85 GB
Integrated GPU (Intel(R) Iris(R) Xe Graphics) with LUID 0x0000000000013232 detected. Shared Total: 8433055744.00 bytes (7.85 GB), Shared Usage: 3236061184.00 bytes (3.01 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 5331212288 total: 8567273472
time=2026-03-31T16:06:29.943+02:00 level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:512 KvCacheType: NumThreads:6 GPULayers:13[ID:GPU-a4f6355b-902f-14e3-2b28-2189eb9ad638 Layers:13(0..12)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_vk_get_device_memory called: uuid a4f6355b-902f-14e3-2b28-2189eb9ad638
ggml_backend_vk_get_device_memory called: luid 0x0000000000013583
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: NVIDIA GeForce RTX 3050 Ti Laptop GPU, LUID: 0x0000000000013583, Dedicated: 3.87 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Intel(R) Iris(R) Xe Graphics, LUID: 0x0000000000013232, Dedicated: 0.12 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x0000000000013510, Dedicated: 0.00 GB, Shared: 7.85 GB
Discrete GPU (NVIDIA GeForce RTX 3050 Ti Laptop GPU) with LUID 0x0000000000013583 detected. Dedicated Total: 4157603840.00 bytes (3.87 GB), Dedicated Usage: 176545792.00 bytes (0.16 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 3981058048 total: 4157603840
ggml_backend_vk_get_device_memory called: uuid 8680a646-0c00-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000013232
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: NVIDIA GeForce RTX 3050 Ti Laptop GPU, LUID: 0x0000000000013583, Dedicated: 3.87 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Intel(R) Iris(R) Xe Graphics, LUID: 0x0000000000013232, Dedicated: 0.12 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x0000000000013510, Dedicated: 0.00 GB, Shared: 7.85 GB
Integrated GPU (Intel(R) Iris(R) Xe Graphics) with LUID 0x0000000000013232 detected. Shared Total: 8433055744.00 bytes (7.85 GB), Shared Usage: 3225890816.00 bytes (3.00 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 5341382656 total: 8567273472
time=2026-03-31T16:06:30.824+02:00 level=INFO source=runner.go:1284 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Disabled KvSize:512 KvCacheType: NumThreads:6 GPULayers:13[ID:GPU-a4f6355b-902f-14e3-2b28-2189eb9ad638 Layers:13(0..12)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-31T16:06:30.824+02:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="906.8 MiB"
time=2026-03-31T16:06:30.825+02:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="366.3 MiB"
time=2026-03-31T16:06:30.825+02:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="28.0 MiB"
time=2026-03-31T16:06:30.825+02:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="1.5 MiB"
time=2026-03-31T16:06:30.825+02:00 level=INFO source=device.go:272 msg="total memory" size="1.3 GiB"
time=2026-03-31T16:06:30.825+02:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-03-31T16:06:30.824+02:00 level=INFO source=ggml.go:482 msg="offloading 12 repeating layers to GPU"
time=2026-03-31T16:06:30.840+02:00 level=INFO source=ggml.go:489 msg="offloading output layer to GPU"
time=2026-03-31T16:06:30.840+02:00 level=INFO source=ggml.go:494 msg="offloaded 13/13 layers to GPU"
time=2026-03-31T16:06:30.839+02:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
time=2026-03-31T16:06:30.853+02:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model"
time=2026-03-31T16:06:36.400+02:00 level=INFO source=server.go:1390 msg="llama runner started in 18.25 seconds"
C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cpu/ops.cpp:4666: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed
[GIN] 2026/03/31 - 16:06:37 | 400 |   32.0813551s |       127.0.0.1 | POST     "/api/embed"
time=2026-03-31T16:06:37.493+02:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 1"

OS

Windows

GPU

Intel, Nvidia

CPU

Intel

Ollama version

0.18.3

extent analysis

TL;DR

The issue is likely caused by a memory or compatibility problem between Ollama and the system's GPU or CPU configuration, given the error message and the detailed logging of GPU memory and operations.

Guidance

  • Review the system's GPU and CPU configuration to ensure compatibility with Ollama version 0.18.3.
  • Check the memory usage and allocation during the execution of Ollama, as the error message GGML_ASSERT(i01 >= 0 && i01 < ne01) failed might indicate an out-of-bounds memory access.
  • Consider updating Ollama to a newer version if available, as the current version 0.18.3 might have known issues or limitations.
  • Verify that the NVIDIA and Intel GPU drivers are up-to-date, as outdated drivers might cause compatibility issues with Ollama.

Example

No specific code example can be provided without more context about the custom code or modifications made to Ollama. However, ensuring that memory allocation and deallocation are properly handled in any custom code interacting with Ollama's backend might help mitigate similar issues.

Notes

The provided log output suggests that Ollama is attempting to utilize both the NVIDIA GeForce RTX 3050 Ti Laptop GPU and the Intel Iris Xe Graphics. The error occurs after loading and offloading layers to the GPU, indicating a potential issue with GPU compatibility or memory management.

Recommendation

Apply a workaround by trying to run Ollama with a specific GPU disabled or by adjusting the memory allocation settings, if possible, to isolate the cause of the issue and potentially resolve it.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING