ollama - 💡(How to fix) Fix Error with intel GPU and unable to use nvidia+intel GPUs at the same time [1 participants]

ollama2026-03-12 21:42:13

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#14805•Fetched 2026-04-08 00:43:16

View on GitHub

Comments

Participants

Timeline

Reactions

Author

ScaryBeats01

Participants

ScaryBeats01

Timeline (top)

labeled ×1

Error Message

llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = granitehybrid llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Granite 4.0 H Tiny llama_model_loader: - kv 3: general.size_label str = 64x994M llama_model_loader: - kv 4: general.license str = apache-2.0 llama_model_loader: - kv 5: general.tags arr[str,2] = ["language", "granite-4.0"] llama_model_loader: - kv 6: granitehybrid.block_count u32 = 40 llama_model_loader: - kv 7: granitehybrid.context_length u32 = 1048576 llama_model_loader: - kv 8: granitehybrid.embedding_length u32 = 1536 llama_model_loader: - kv 9: granitehybrid.feed_forward_length u32 = 512 llama_model_loader: - kv 10: granitehybrid.attention.head_count u32 = 12 llama_model_loader: - kv 11: granitehybrid.attention.head_count_kv arr[i32,40] = [0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, ... llama_model_loader: - kv 12: granitehybrid.rope.freq_base f32 = 10000.000000 llama_model_loader: - kv 13: granitehybrid.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 14: granitehybrid.expert_count u32 = 64 llama_model_loader: - kv 15: granitehybrid.expert_used_count u32 = 6 llama_model_loader: - kv 16: granitehybrid.vocab_size u32 = 100352 llama_model_loader: - kv 17: granitehybrid.rope.dimension_count u32 = 128 llama_model_loader: - kv 18: granitehybrid.attention.scale f32 = 0.007813 llama_model_loader: - kv 19: granitehybrid.embedding_scale f32 = 12.000000 llama_model_loader: - kv 20: granitehybrid.residual_scale f32 = 0.220000 llama_model_loader: - kv 21: granitehybrid.logit_scale f32 = 6.000000 llama_model_loader: - kv 22: granitehybrid.expert_shared_feed_forward_length u32 = 1024 llama_model_loader: - kv 23: granitehybrid.ssm.conv_kernel u32 = 4 llama_model_loader: - kv 24: granitehybrid.ssm.state_size u32 = 128 llama_model_loader: - kv 25: granitehybrid.ssm.group_count u32 = 1 llama_model_loader: - kv 26: granitehybrid.ssm.inner_size u32 = 3072 llama_model_loader: - kv 27: granitehybrid.ssm.time_step_rank u32 = 48 llama_model_loader: - kv 28: granitehybrid.rope.scaling.finetuned bool = false llama_model_loader: - kv 29: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 30: tokenizer.ggml.pre str = dbrx llama_model_loader: - kv 31: tokenizer.ggml.tokens arr[str,100352] = ["!", """, "#", "$", "%", "&", "'", ... llama_model_loader: - kv 32: tokenizer.ggml.token_type arr[i32,100352] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 33: tokenizer.ggml.merges arr[str,100000] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 34: tokenizer.ggml.bos_token_id u32 = 100257 llama_model_loader: - kv 35: tokenizer.ggml.eos_token_id u32 = 100257 llama_model_loader: - kv 36: tokenizer.ggml.unknown_token_id u32 = 100269 llama_model_loader: - kv 37: tokenizer.ggml.padding_token_id u32 = 100256 llama_model_loader: - kv 38: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 39: tokenizer.chat_template str = {%- set tools_system_message_prefix =... llama_model_loader: - kv 40: general.quantization_version u32 = 2 llama_model_loader: - kv 41: general.file_type u32 = 15 llama_model_loader: - type f32: 337 tensors llama_model_loader: - type q4_K: 286 tensors llama_model_loader: - type q6_K: 43 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 3.94 GiB (4.87 BPW) load: printing all EOG tokens: load: - 100257 ('<|end_of_text|>') load: - 100261 ('<|fim_pad|>') load: special tokens cache size = 96 load: token to piece cache size = 0.6152 MB print_info: arch = granitehybrid print_info: vocab_only = 0 print_info: no_alloc = 0 print_info: n_ctx_train = 1048576 print_info: n_embd = 1536 print_info: n_embd_inp = 1536 print_info: n_layer = 40 print_info: n_head = 12 print_info: n_head_kv = [0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0] print_info: n_rot = 128 print_info: n_swa = 0 print_info: is_swa_any = 0 print_info: n_embd_head_k = 128 print_info: n_embd_head_v = 128 print_info: n_gqa = [0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0] print_info: n_embd_k_gqa = [0, 0, 0, 0, 0, 512, 0, 0, 0, 0, 0, 0, 0, 0, 0, 512, 0, 0, 0, 0, 0, 0, 0, 0, 0, 512, 0, 0, 0, 0, 0, 0, 0, 0, 0, 512, 0, 0, 0, 0] print_info: n_embd_v_gqa = [0, 0, 0, 0, 0, 512, 0, 0, 0, 0, 0, 0, 0, 0, 0, 512, 0, 0, 0, 0, 0, 0, 0, 0, 0, 512, 0, 0, 0, 0, 0, 0, 0, 0, 0, 512, 0, 0, 0, 0] print_info: f_norm_eps = 0.0e+00 print_info: f_norm_rms_eps = 1.0e-05 print_info: f_clamp_kqv = 0.0e+00 print_info: f_max_alibi_bias = 0.0e+00 print_info: f_logit_scale = 6.0e+00 print_info: f_attn_scale = 7.8e-03 print_info: n_ff = 512 print_info: n_expert = 64 print_info: n_expert_used = 6 print_info: n_expert_groups = 0 print_info: n_group_used = 0 print_info: causal attn = 1 print_info: pooling type = 0 print_info: rope type = 0 print_info: rope scaling = linear print_info: freq_base_train = 10000.0 print_info: freq_scale_train = 1 print_info: n_ctx_orig_yarn = 1048576 print_info: rope_yarn_log_mul= 0.0000 print_info: rope_finetuned = unknown print_info: ssm_d_conv = 4 print_info: ssm_d_inner = 3072 print_info: ssm_d_state = 128 print_info: ssm_dt_rank = 48 print_info: ssm_n_group = 1 print_info: ssm_dt_b_c_rms = 0 print_info: model type = 1B print_info: model params = 6.94 B print_info: general.name = Granite 4.0 H Tiny print_info: f_embedding_scale = 12.000000 print_info: f_residual_scale = 0.220000 print_info: f_attention_scale = 0.007813 print_info: n_ff_shexp = 1024 print_info: vocab type = BPE print_info: n_vocab = 100352 print_info: n_merges = 100000 print_info: BOS token = 100257 '<|end_of_text|>' print_info: EOS token = 100257 '<|end_of_text|>' print_info: EOT token = 100257 '<|end_of_text|>' print_info: UNK token = 100269 '<|unk|>' print_info: PAD token = 100256 '<|pad|>' print_info: LF token = 198 'Ċ' print_info: FIM PRE token = 100258 '<|fim_prefix|>' print_info: FIM SUF token = 100260 '<|fim_suffix|>' print_info: FIM MID token = 100259 '<|fim_middle|>' print_info: FIM PAD token = 100261 '<|fim_pad|>' print_info: EOG token = 100257 '<|end_of_text|>' print_info: EOG token = 100261 '<|fim_pad|>' print_info: max token length = 256 load_tensors: loading model tensors, this can take a while... (mmap = false) ggml_backend_vk_get_device_memory called: uuid 8680a646-0c00-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000013502 ggml_dxgi_pdh_init called DXGI + PDH Initialized. Getting GPU free memory info [DXGI] Adapter Description: NVIDIA GeForce RTX 3050 Ti Laptop GPU, LUID: 0x000000000001386A, Dedicated: 3.87 GB, Shared: 7.85 GB [DXGI] Adapter Description: Intel(R) Iris(R) Xe Graphics, LUID: 0x0000000000013502, Dedicated: 0.12 GB, Shared: 7.85 GB [DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x00000000000137DE, Dedicated: 0.00 GB, Shared: 7.85 GB Integrated GPU (Intel(R) Iris(R) Xe Graphics) with LUID 0x0000000000013502 detected. Shared Total: 8433055744.00 bytes (7.85 GB), Shared Usage: 2856886272.00 bytes (2.66 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB) ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 5710387200 total: 8567273472 ggml_backend_vk_get_device_memory called: uuid a4f6355b-902f-14e3-2b28-2189eb9ad638 ggml_backend_vk_get_device_memory called: luid 0x000000000001386a ggml_dxgi_pdh_init called DXGI + PDH Initialized. Getting GPU free memory info [DXGI] Adapter Description: NVIDIA GeForce RTX 3050 Ti Laptop GPU, LUID: 0x000000000001386A, Dedicated: 3.87 GB, Shared: 7.85 GB [DXGI] Adapter Description: Intel(R) Iris(R) Xe Graphics, LUID: 0x0000000000013502, Dedicated: 0.12 GB, Shared: 7.85 GB [DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x00000000000137DE, Dedicated: 0.00 GB, Shared: 7.85 GB Discrete GPU (NVIDIA GeForce RTX 3050 Ti Laptop GPU) with LUID 0x000000000001386a detected. Dedicated Total: 4157603840.00 bytes (3.87 GB), Dedicated Usage: 9768960.00 bytes (0.01 GB) ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 4147834880 total: 4157603840 load_tensors: offloading 40 repeating layers to GPU load_tensors: offloading output layer to GPU load_tensors: offloaded 41/41 layers to GPU load_tensors: Vulkan1 model buffer size = 4031.55 MiB load_tensors: Vulkan_Host model buffer size = 120.59 MiB ggml_backend_vk_get_device_memory called: uuid 8680a646-0c00-0000-0000-000000000000 ggml_backend_vk_get_device_memory called: luid 0x0000000000013502 ggml_dxgi_pdh_init called DXGI + PDH Initialized. Getting GPU free memory info [DXGI] Adapter Description: NVIDIA GeForce RTX 3050 Ti Laptop GPU, LUID: 0x000000000001386A, Dedicated: 3.87 GB, Shared: 7.85 GB [DXGI] Adapter Description: Intel(R) Iris(R) Xe Graphics, LUID: 0x0000000000013502, Dedicated: 0.12 GB, Shared: 7.85 GB [DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x00000000000137DE, Dedicated: 0.00 GB, Shared: 7.85 GB Integrated GPU (Intel(R) Iris(R) Xe Graphics) with LUID 0x0000000000013502 detected. Shared Total: 8433055744.00 bytes (7.85 GB), Shared Usage: 6038458368.00 bytes (5.62 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB) ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 2528815104 total: 8567273472 time=2026-03-12T21:48:42.270+01:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server not responding" ggml_vulkan: Memory allocation of size 6144 failed. ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory Exception 0xe06d7363 0x19930520 0xe24636efb8 0x7ffe1d79055c PC=0x7ffe1d79055c signal arrived during external code execution

runtime.cgocall(0x7ff6fdf10600, 0xc00038fb58) runtime/cgocall.go:167 +0x3e fp=0xc00038fb30 sp=0xc00038fac8 pc=0x7ff6fd02243e github.com/ollama/ollama/llama._Cfunc_llama_model_load_from_file(0x20dada5a720, {0xc000368a10, 0x0, 0x29, 0x1, 0x0, 0xc000334010, 0x7ff6fdf0fd00, 0xc000368a08, 0x0, ...}) _cgo_gotypes.go:902 +0x51 fp=0xc00038fb58 sp=0xc00038fb30 pc=0x7ff6fd4d4bb1 github.com/ollama/ollama/llama.LoadModelFromFile.func1(...) github.com/ollama/ollama/llama/llama.go:308 github.com/ollama/ollama/llama.LoadModelFromFile({0xc0000aa0e0, 0x6b}, {{0xc000334018, 0x1, 0x1}, 0x29, 0x0, 0x0, {0xc000334010, 0x1, ...}, ...}) github.com/ollama/ollama/llama/llama.go:308 +0x57f fp=0xc00038fda0 sp=0xc00038fb58 pc=0x7ff6fd4d81df github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc000130280, {{0xc000334018, 0x1, 0x1}, 0x29, 0x0, 0x0, {0xc000334010, 0x1, 0x2}, ...}, ...) github.com/ollama/ollama/runner/llamarunner/runner.go:841 +0x9e fp=0xc00038fee8 sp=0xc00038fda0 pc=0x7ff6fd58f3de github.com/ollama/ollama/runner/llamarunner.(*Server).load.gowrap2() github.com/ollama/ollama/runner/llamarunner/runner.go:934 +0x114 fp=0xc00038ffe0 sp=0xc00038fee8 pc=0x7ff6fd5906d4 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00038ffe8 sp=0xc00038ffe0 pc=0x7ff6fd02d9a1 created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 41 github.com/ollama/ollama/runner/llamarunner/runner.go:934 +0x889

goroutine 1 gp=0xc0000021c0 m=nil [IO wait]: runtime.gopark(0x7ff6fd02f1a0?, 0x7ff6ff2642e0?, 0x20?, 0xa0?, 0xc00013a0cc?) runtime/proc.go:435 +0xce fp=0xc000125630 sp=0xc000125610 pc=0x7ff6fd02598e runtime.netpollblock(0x5bc?, 0xfcfc0406?, 0xf6?) runtime/netpoll.go:575 +0xf7 fp=0xc000125668 sp=0xc000125630 pc=0x7ff6fcfebdf7 internal/poll.runtime_pollWait(0x20d62200e70, 0x72) runtime/netpoll.go:351 +0x85 fp=0xc000125688 sp=0xc000125668 pc=0x7ff6fd024b25 internal/poll.(*pollDesc).wait(0x7ff6fd0ba953?, 0x0?, 0x0) internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0001256b0 sp=0xc000125688 pc=0x7ff6fd0bbf47 internal/poll.execIO(0xc00013a020, 0xc000125758) internal/poll/fd_windows.go:177 +0x105 fp=0xc000125728 sp=0xc0001256b0 pc=0x7ff6fd0bd3a5 internal/poll.(*FD).acceptOne(0xc00013a008, 0x5c4, {0xc0001420f0?, 0xc0001257b8?, 0x7ff6fd0c5065?}, 0xc0001257ec?) internal/poll/fd_windows.go:946 +0x65 fp=0xc000125788 sp=0xc000125728 pc=0x7ff6fd0c1925 internal/poll.(*FD).Accept(0xc00013a008, 0xc000125938) internal/poll/fd_windows.go:980 +0x1b6 fp=0xc000125840 sp=0xc000125788 pc=0x7ff6fd0c1c56 net.(*netFD).accept(0xc00013a008) net/fd_windows.go:182 +0x4b fp=0xc000125958 sp=0xc000125840 pc=0x7ff6fd13358b net.(*TCPListener).accept(0xc0000a7600) net/tcpsock_posix.go:159 +0x1b fp=0xc0001259a8 sp=0xc000125958 pc=0x7ff6fd149b3b net.(*TCPListener).Accept(0xc0000a7600) net/tcpsock.go:380 +0x30 fp=0xc0001259d8 sp=0xc0001259a8 pc=0x7ff6fd1488f0 net/http.(*onceCloseListener).Accept(0xc000140090?) <autogenerated>:1 +0x24 fp=0xc0001259f0 sp=0xc0001259d8 pc=0x7ff6fd361fe4 net/http.(*Server).Serve(0xc0001de700, {0x7ff6fe79d5b0, 0xc0000a7600}) net/http/server.go:3424 +0x30c fp=0xc000125b20 sp=0xc0001259f0 pc=0x7ff6fd3398ac github.com/ollama/ollama/runner/llamarunner.Execute({0xc0000c2020, 0x4, 0x6}) github.com/ollama/ollama/runner/llamarunner/runner.go:1002 +0x8f5 fp=0xc000125cf0 sp=0xc000125b20 pc=0x7ff6fd591095 github.com/ollama/ollama/runner.Execute({0xc0000c2010?, 0x0?, 0x0?}) github.com/ollama/ollama/runner/runner.go:25 +0x1a5 fp=0xc000125d30 sp=0xc000125cf0 pc=0x7ff6fd672e25 github.com/ollama/ollama/cmd.NewCLI.func3(0xc00063f200?, {0x7ff6fe5747ef?, 0x4?, 0x7ff6fe5747f3?}) github.com/ollama/ollama/cmd/cmd.go:2271 +0x45 fp=0xc000125d58 sp=0xc000125d30 pc=0x7ff6fde9ee65 github.com/spf13/cobra.(*Command).execute(0xc0002f5b08, {0xc00062b380, 0x4, 0x4}) github.com/spf13/[email protected]/command.go:940 +0x85c fp=0xc000125e78 sp=0xc000125d58 pc=0x7ff6fd1ae75c github.com/spf13/cobra.(*Command).ExecuteC(0xc00068ef08) github.com/spf13/[email protected]/command.go:1068 +0x3a5 fp=0xc000125f30 sp=0xc000125e78 pc=0x7ff6fd1aefa5 github.com/spf13/cobra.(*Command).Execute(...) github.com/spf13/[email protected]/command.go:992 github.com/spf13/cobra.(*Command).ExecuteContext(...) github.com/spf13/[email protected]/command.go:985 main.main() github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000125f50 sp=0xc000125f30 pc=0x7ff6fdea130d runtime.main() runtime/proc.go:283 +0x27d fp=0xc000125fe0 sp=0xc000125f50 pc=0x7ff6fcff4ddd runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000125fe8 sp=0xc000125fe0 pc=0x7ff6fd02d9a1

goroutine 2 gp=0xc0000028c0 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008dfa8 sp=0xc00008df88 pc=0x7ff6fd02598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.forcegchelper() runtime/proc.go:348 +0xb8 fp=0xc00008dfe0 sp=0xc00008dfa8 pc=0x7ff6fcff50f8 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008dfe8 sp=0xc00008dfe0 pc=0x7ff6fd02d9a1 created by runtime.init.7 in goroutine 1 runtime/proc.go:336 +0x1a

goroutine 3 gp=0xc000002c40 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00008ff80 sp=0xc00008ff60 pc=0x7ff6fd02598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.bgsweep(0xc00009c000) runtime/mgcsweep.go:316 +0xdf fp=0xc00008ffc8 sp=0xc00008ff80 pc=0x7ff6fcfddebf runtime.gcenable.gowrap1() runtime/mgc.go:204 +0x25 fp=0xc00008ffe0 sp=0xc00008ffc8 pc=0x7ff6fcfd2285 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00008ffe8 sp=0xc00008ffe0 pc=0x7ff6fd02d9a1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000002e00 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x7ff6fe787ac0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0000a3f78 sp=0xc0000a3f58 pc=0x7ff6fd02598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.(*scavengerState).park(0x7ff6ff28e080) runtime/mgcscavenge.go:425 +0x49 fp=0xc0000a3fa8 sp=0xc0000a3f78 pc=0x7ff6fcfdb909 runtime.bgscavenge(0xc00009c000) runtime/mgcscavenge.go:658 +0x59 fp=0xc0000a3fc8 sp=0xc0000a3fa8 pc=0x7ff6fcfdbe99 runtime.gcenable.gowrap2() runtime/mgc.go:205 +0x25 fp=0xc0000a3fe0 sp=0xc0000a3fc8 pc=0x7ff6fcfd2225 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000a3fe8 sp=0xc0000a3fe0 pc=0x7ff6fd02d9a1 created by runtime.gcenable in goroutine 1 runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000003340 m=nil [finalizer wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0000a5e30 sp=0xc0000a5e10 pc=0x7ff6fd02598e runtime.runfinq() runtime/mfinal.go:196 +0x107 fp=0xc0000a5fe0 sp=0xc0000a5e30 pc=0x7ff6fcfd1207 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000a5fe8 sp=0xc0000a5fe0 pc=0x7ff6fd02d9a1 created by runtime.createfing in goroutine 1 runtime/mfinal.go:166 +0x3d

goroutine 6 gp=0xc000003dc0 m=nil [chan receive]: runtime.gopark(0xc000213540?, 0xc000518018?, 0x60?, 0x1f?, 0x7ff6fd11c1a8?) runtime/proc.go:435 +0xce fp=0xc000091f18 sp=0xc000091ef8 pc=0x7ff6fd02598e runtime.chanrecv(0xc0000aa3f0, 0x0, 0x1) runtime/chan.go:664 +0x445 fp=0xc000091f90 sp=0xc000091f18 pc=0x7ff6fcfc2d45 runtime.chanrecv1(0x7ff6fcff4f40?, 0xc000091f76?) runtime/chan.go:506 +0x12 fp=0xc000091fb8 sp=0xc000091f90 pc=0x7ff6fcfc28d2 runtime.unique_runtime_registerUniqueMapCleanup.func2(...) runtime/mgc.go:1796 runtime.unique_runtime_registerUniqueMapCleanup.gowrap1() runtime/mgc.go:1799 +0x2f fp=0xc000091fe0 sp=0xc000091fb8 pc=0x7ff6fcfd54af runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000091fe8 sp=0xc000091fe0 pc=0x7ff6fd02d9a1 created by unique.runtime_registerUniqueMapCleanup in goroutine 1 runtime/mgc.go:1794 +0x85

goroutine 7 gp=0xc000268540 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00009ff38 sp=0xc00009ff18 pc=0x7ff6fd02598e runtime.gcBgMarkWorker(0xc0000ab810) runtime/mgc.go:1423 +0xe9 fp=0xc00009ffc8 sp=0xc00009ff38 pc=0x7ff6fcfd47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00009ffe0 sp=0xc00009ffc8 pc=0x7ff6fcfd4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00009ffe8 sp=0xc00009ffe0 pc=0x7ff6fd02d9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105

goroutine 8 gp=0xc000268700 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc0000a1f38 sp=0xc0000a1f18 pc=0x7ff6fd02598e runtime.gcBgMarkWorker(0xc0000ab810) runtime/mgc.go:1423 +0xe9 fp=0xc0000a1fc8 sp=0xc0000a1f38 pc=0x7ff6fcfd47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc0000a1fe0 sp=0xc0000a1fc8 pc=0x7ff6fcfd4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc0000a1fe8 sp=0xc0000a1fe0 pc=0x7ff6fd02d9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105

goroutine 9 gp=0xc0002688c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000473f38 sp=0xc000473f18 pc=0x7ff6fd02598e runtime.gcBgMarkWorker(0xc0000ab810) runtime/mgc.go:1423 +0xe9 fp=0xc000473fc8 sp=0xc000473f38 pc=0x7ff6fcfd47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000473fe0 sp=0xc000473fc8 pc=0x7ff6fcfd4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000473fe8 sp=0xc000473fe0 pc=0x7ff6fd02d9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105

goroutine 10 gp=0xc000268a80 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000475f38 sp=0xc000475f18 pc=0x7ff6fd02598e runtime.gcBgMarkWorker(0xc0000ab810) runtime/mgc.go:1423 +0xe9 fp=0xc000475fc8 sp=0xc000475f38 pc=0x7ff6fcfd47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000475fe0 sp=0xc000475fc8 pc=0x7ff6fcfd4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000475fe8 sp=0xc000475fe0 pc=0x7ff6fd02d9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105

goroutine 18 gp=0xc000484000 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00046ff38 sp=0xc00046ff18 pc=0x7ff6fd02598e runtime.gcBgMarkWorker(0xc0000ab810) runtime/mgc.go:1423 +0xe9 fp=0xc00046ffc8 sp=0xc00046ff38 pc=0x7ff6fcfd47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00046ffe0 sp=0xc00046ffc8 pc=0x7ff6fcfd4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00046ffe8 sp=0xc00046ffe0 pc=0x7ff6fd02d9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105

goroutine 34 gp=0xc0001061c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000113f38 sp=0xc000113f18 pc=0x7ff6fd02598e runtime.gcBgMarkWorker(0xc0000ab810) runtime/mgc.go:1423 +0xe9 fp=0xc000113fc8 sp=0xc000113f38 pc=0x7ff6fcfd47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000113fe0 sp=0xc000113fc8 pc=0x7ff6fcfd4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000113fe8 sp=0xc000113fe0 pc=0x7ff6fd02d9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105

goroutine 19 gp=0xc0004841c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000471f38 sp=0xc000471f18 pc=0x7ff6fd02598e runtime.gcBgMarkWorker(0xc0000ab810) runtime/mgc.go:1423 +0xe9 fp=0xc000471fc8 sp=0xc000471f38 pc=0x7ff6fcfd47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000471fe0 sp=0xc000471fc8 pc=0x7ff6fcfd4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000471fe8 sp=0xc000471fe0 pc=0x7ff6fd02d9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105

goroutine 11 gp=0xc000268c40 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00010ff38 sp=0xc00010ff18 pc=0x7ff6fd02598e runtime.gcBgMarkWorker(0xc0000ab810) runtime/mgc.go:1423 +0xe9 fp=0xc00010ffc8 sp=0xc00010ff38 pc=0x7ff6fcfd47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00010ffe0 sp=0xc00010ffc8 pc=0x7ff6fcfd4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00010ffe8 sp=0xc00010ffe0 pc=0x7ff6fd02d9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105

goroutine 12 gp=0xc000268e00 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000111f38 sp=0xc000111f18 pc=0x7ff6fd02598e runtime.gcBgMarkWorker(0xc0000ab810) runtime/mgc.go:1423 +0xe9 fp=0xc000111fc8 sp=0xc000111f38 pc=0x7ff6fcfd47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000111fe0 sp=0xc000111fc8 pc=0x7ff6fcfd4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000111fe8 sp=0xc000111fe0 pc=0x7ff6fd02d9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105

goroutine 13 gp=0xc000268fc0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00047bf38 sp=0xc00047bf18 pc=0x7ff6fd02598e runtime.gcBgMarkWorker(0xc0000ab810) runtime/mgc.go:1423 +0xe9 fp=0xc00047bfc8 sp=0xc00047bf38 pc=0x7ff6fcfd47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00047bfe0 sp=0xc00047bfc8 pc=0x7ff6fcfd4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00047bfe8 sp=0xc00047bfe0 pc=0x7ff6fd02d9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105

goroutine 20 gp=0xc000484380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000477f38 sp=0xc000477f18 pc=0x7ff6fd02598e runtime.gcBgMarkWorker(0xc0000ab810) runtime/mgc.go:1423 +0xe9 fp=0xc000477fc8 sp=0xc000477f38 pc=0x7ff6fcfd47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000477fe0 sp=0xc000477fc8 pc=0x7ff6fcfd4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000477fe8 sp=0xc000477fe0 pc=0x7ff6fd02d9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105

goroutine 35 gp=0xc000106380 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000115f38 sp=0xc000115f18 pc=0x7ff6fd02598e runtime.gcBgMarkWorker(0xc0000ab810) runtime/mgc.go:1423 +0xe9 fp=0xc000115fc8 sp=0xc000115f38 pc=0x7ff6fcfd47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000115fe0 sp=0xc000115fc8 pc=0x7ff6fcfd4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000115fe8 sp=0xc000115fe0 pc=0x7ff6fd02d9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105

goroutine 36 gp=0xc000106540 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011bf38 sp=0xc00011bf18 pc=0x7ff6fd02598e runtime.gcBgMarkWorker(0xc0000ab810) runtime/mgc.go:1423 +0xe9 fp=0xc00011bfc8 sp=0xc00011bf38 pc=0x7ff6fcfd47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011bfe0 sp=0xc00011bfc8 pc=0x7ff6fcfd4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011bfe8 sp=0xc00011bfe0 pc=0x7ff6fd02d9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105

goroutine 37 gp=0xc000106700 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00011df38 sp=0xc00011df18 pc=0x7ff6fd02598e runtime.gcBgMarkWorker(0xc0000ab810) runtime/mgc.go:1423 +0xe9 fp=0xc00011dfc8 sp=0xc00011df38 pc=0x7ff6fcfd47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00011dfe0 sp=0xc00011dfc8 pc=0x7ff6fcfd4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00011dfe8 sp=0xc00011dfe0 pc=0x7ff6fd02d9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105

goroutine 38 gp=0xc0001068c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000117f38 sp=0xc000117f18 pc=0x7ff6fd02598e runtime.gcBgMarkWorker(0xc0000ab810) runtime/mgc.go:1423 +0xe9 fp=0xc000117fc8 sp=0xc000117f38 pc=0x7ff6fcfd47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000117fe0 sp=0xc000117fc8 pc=0x7ff6fcfd4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000117fe8 sp=0xc000117fe0 pc=0x7ff6fd02d9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105

goroutine 39 gp=0xc000106a80 m=nil [GC worker (idle)]: runtime.gopark(0x2d24f5154b30?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000119f38 sp=0xc000119f18 pc=0x7ff6fd02598e runtime.gcBgMarkWorker(0xc0000ab810) runtime/mgc.go:1423 +0xe9 fp=0xc000119fc8 sp=0xc000119f38 pc=0x7ff6fcfd47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000119fe0 sp=0xc000119fc8 pc=0x7ff6fcfd4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000119fe8 sp=0xc000119fe0 pc=0x7ff6fd02d9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105

goroutine 21 gp=0xc000484540 m=nil [GC worker (idle)]: runtime.gopark(0x2d24f5154b30?, 0x0?, 0x0?, 0x0?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000479f38 sp=0xc000479f18 pc=0x7ff6fd02598e runtime.gcBgMarkWorker(0xc0000ab810) runtime/mgc.go:1423 +0xe9 fp=0xc000479fc8 sp=0xc000479f38 pc=0x7ff6fcfd47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000479fe0 sp=0xc000479fc8 pc=0x7ff6fcfd4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000479fe8 sp=0xc000479fe0 pc=0x7ff6fd02d9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105

goroutine 14 gp=0xc000269180 m=nil [GC worker (idle)]: runtime.gopark(0x7ff6ff2e10e0?, 0x1?, 0x18?, 0x30?, 0x0?) runtime/proc.go:435 +0xce fp=0xc00047df38 sp=0xc00047df18 pc=0x7ff6fd02598e runtime.gcBgMarkWorker(0xc0000ab810) runtime/mgc.go:1423 +0xe9 fp=0xc00047dfc8 sp=0xc00047df38 pc=0x7ff6fcfd47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc00047dfe0 sp=0xc00047dfc8 pc=0x7ff6fcfd4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc00047dfe8 sp=0xc00047dfe0 pc=0x7ff6fd02d9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105

goroutine 15 gp=0xc000269340 m=nil [GC worker (idle)]: runtime.gopark(0x7ff6ff2e10e0?, 0x1?, 0x18?, 0x30?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000505f38 sp=0xc000505f18 pc=0x7ff6fd02598e runtime.gcBgMarkWorker(0xc0000ab810) runtime/mgc.go:1423 +0xe9 fp=0xc000505fc8 sp=0xc000505f38 pc=0x7ff6fcfd47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000505fe0 sp=0xc000505fc8 pc=0x7ff6fcfd4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000505fe8 sp=0xc000505fe0 pc=0x7ff6fd02d9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105

goroutine 16 gp=0xc000269500 m=nil [GC worker (idle)]: runtime.gopark(0x7ff6ff2e10e0?, 0x1?, 0x18?, 0x30?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000507f38 sp=0xc000507f18 pc=0x7ff6fd02598e runtime.gcBgMarkWorker(0xc0000ab810) runtime/mgc.go:1423 +0xe9 fp=0xc000507fc8 sp=0xc000507f38 pc=0x7ff6fcfd47a9 runtime.gcBgMarkStartWorkers.gowrap1() runtime/mgc.go:1339 +0x25 fp=0xc000507fe0 sp=0xc000507fc8 pc=0x7ff6fcfd4685 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000507fe8 sp=0xc000507fe0 pc=0x7ff6fd02d9a1 created by runtime.gcBgMarkStartWorkers in goroutine 1 runtime/mgc.go:1339 +0x105

goroutine 40 gp=0xc000484700 m=nil [sync.WaitGroup.Wait]: runtime.gopark(0x0?, 0x0?, 0x60?, 0xfe?, 0x0?) runtime/proc.go:435 +0xce fp=0xc000501e20 sp=0xc000501e00 pc=0x7ff6fd02598e runtime.goparkunlock(...) runtime/proc.go:441 runtime.semacquire1(0xc0001302a0, 0x0, 0x1, 0x0, 0x18) runtime/sema.go:188 +0x22f fp=0xc000501e88 sp=0xc000501e20 pc=0x7ff6fd00750f sync.runtime_SemacquireWaitGroup(0x0?) runtime/sema.go:110 +0x25 fp=0xc000501ec0 sp=0xc000501e88 pc=0x7ff6fd026f85 sync.(*WaitGroup).Wait(0x0?) sync/waitgroup.go:118 +0x48 fp=0xc000501ee8 sp=0xc000501ec0 pc=0x7ff6fd03b988 github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc000130280, {0x7ff6fe79fed0, 0xc0001340f0}) github.com/ollama/ollama/runner/llamarunner/runner.go:360 +0x4b fp=0xc000501fb8 sp=0xc000501ee8 pc=0x7ff6fd58bdab github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1() github.com/ollama/ollama/runner/llamarunner/runner.go:981 +0x28 fp=0xc000501fe0 sp=0xc000501fb8 pc=0x7ff6fd591308 runtime.goexit({}) runtime/asm_amd64.s:1700 +0x1 fp=0xc000501fe8 sp=0xc000501fe0 pc=0x7ff6fd02d9a1 created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1 github.com/ollama/ollama/runner/llamarunner/runner.go:981 +0x4c5 rax 0x0 rbx 0x7ffd0d923c60 rcx 0x7ffd0d907da0 rdx 0x7ffd0a8ca55c rdi 0x19930520 rsi 0xe24636efb8 rbp 0xe24636ef70 rsp 0xe24636a0f0 r8 0x0 r9 0x0 r10 0xfffffffffffffffe r11 0x0 r12 0xe24636a310 r13 0x0 r14 0xe24636b198 r15 0x0 rip 0x7ffe1d79055c rflags 0x202 cs 0x33 fs 0x53 gs 0x2b

Code Example

llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = granitehybrid
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Granite 4.0 H Tiny
llama_model_loader: - kv   3:                         general.size_label str              = 64x994M
llama_model_loader: - kv   4:                            general.license str              = apache-2.0
llama_model_loader: - kv   5:                               general.tags arr[str,2]       = ["language", "granite-4.0"]
llama_model_loader: - kv   6:                  granitehybrid.block_count u32              = 40
llama_model_loader: - kv   7:               granitehybrid.context_length u32              = 1048576
llama_model_loader: - kv   8:             granitehybrid.embedding_length u32              = 1536
llama_model_loader: - kv   9:          granitehybrid.feed_forward_length u32              = 512
llama_model_loader: - kv  10:         granitehybrid.attention.head_count u32              = 12
llama_model_loader: - kv  11:      granitehybrid.attention.head_count_kv arr[i32,40]      = [0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, ...
llama_model_loader: - kv  12:               granitehybrid.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  13: granitehybrid.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  14:                 granitehybrid.expert_count u32              = 64
llama_model_loader: - kv  15:            granitehybrid.expert_used_count u32              = 6
llama_model_loader: - kv  16:                   granitehybrid.vocab_size u32              = 100352
llama_model_loader: - kv  17:         granitehybrid.rope.dimension_count u32              = 128
llama_model_loader: - kv  18:              granitehybrid.attention.scale f32              = 0.007813
llama_model_loader: - kv  19:              granitehybrid.embedding_scale f32              = 12.000000
llama_model_loader: - kv  20:               granitehybrid.residual_scale f32              = 0.220000
llama_model_loader: - kv  21:                  granitehybrid.logit_scale f32              = 6.000000
llama_model_loader: - kv  22: granitehybrid.expert_shared_feed_forward_length u32              = 1024
llama_model_loader: - kv  23:              granitehybrid.ssm.conv_kernel u32              = 4
llama_model_loader: - kv  24:               granitehybrid.ssm.state_size u32              = 128
llama_model_loader: - kv  25:              granitehybrid.ssm.group_count u32              = 1
llama_model_loader: - kv  26:               granitehybrid.ssm.inner_size u32              = 3072
llama_model_loader: - kv  27:           granitehybrid.ssm.time_step_rank u32              = 48
llama_model_loader: - kv  28:       granitehybrid.rope.scaling.finetuned bool             = false
llama_model_loader: - kv  29:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  30:                         tokenizer.ggml.pre str              = dbrx
llama_model_loader: - kv  31:                      tokenizer.ggml.tokens arr[str,100352]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  32:                  tokenizer.ggml.token_type arr[i32,100352]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  33:                      tokenizer.ggml.merges arr[str,100000]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  34:                tokenizer.ggml.bos_token_id u32              = 100257
llama_model_loader: - kv  35:                tokenizer.ggml.eos_token_id u32              = 100257
llama_model_loader: - kv  36:            tokenizer.ggml.unknown_token_id u32              = 100269
llama_model_loader: - kv  37:            tokenizer.ggml.padding_token_id u32              = 100256
llama_model_loader: - kv  38:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  39:                    tokenizer.chat_template str              = {%- set tools_system_message_prefix =...
llama_model_loader: - kv  40:               general.quantization_version u32              = 2
llama_model_loader: - kv  41:                          general.file_type u32              = 15
llama_model_loader: - type  f32:  337 tensors
llama_model_loader: - type q4_K:  286 tensors
llama_model_loader: - type q6_K:   43 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 3.94 GiB (4.87 BPW)
load: printing all EOG tokens:
load:   - 100257 ('<|end_of_text|>')
load:   - 100261 ('<|fim_pad|>')
load: special tokens cache size = 96
load: token to piece cache size = 0.6152 MB
print_info: arch             = granitehybrid
print_info: vocab_only       = 0
print_info: no_alloc         = 0
print_info: n_ctx_train      = 1048576
print_info: n_embd           = 1536
print_info: n_embd_inp       = 1536
print_info: n_layer          = 40
print_info: n_head           = 12
print_info: n_head_kv        = [0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0]
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = [0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0]
print_info: n_embd_k_gqa     = [0, 0, 0, 0, 0, 512, 0, 0, 0, 0, 0, 0, 0, 0, 0, 512, 0, 0, 0, 0, 0, 0, 0, 0, 0, 512, 0, 0, 0, 0, 0, 0, 0, 0, 0, 512, 0, 0, 0, 0]
print_info: n_embd_v_gqa     = [0, 0, 0, 0, 0, 512, 0, 0, 0, 0, 0, 0, 0, 0, 0, 512, 0, 0, 0, 0, 0, 0, 0, 0, 0, 512, 0, 0, 0, 0, 0, 0, 0, 0, 0, 512, 0, 0, 0, 0]
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 6.0e+00
print_info: f_attn_scale     = 7.8e-03
print_info: n_ff             = 512
print_info: n_expert         = 64
print_info: n_expert_used    = 6
print_info: n_expert_groups  = 0
print_info: n_group_used     = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 10000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 1048576
print_info: rope_yarn_log_mul= 0.0000
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 4
print_info: ssm_d_inner      = 3072
print_info: ssm_d_state      = 128
print_info: ssm_dt_rank      = 48
print_info: ssm_n_group      = 1
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 1B
print_info: model params     = 6.94 B
print_info: general.name     = Granite 4.0 H Tiny
print_info: f_embedding_scale = 12.000000
print_info: f_residual_scale  = 0.220000
print_info: f_attention_scale = 0.007813
print_info: n_ff_shexp        = 1024
print_info: vocab type       = BPE
print_info: n_vocab          = 100352
print_info: n_merges         = 100000
print_info: BOS token        = 100257 '<|end_of_text|>'
print_info: EOS token        = 100257 '<|end_of_text|>'
print_info: EOT token        = 100257 '<|end_of_text|>'
print_info: UNK token        = 100269 '<|unk|>'
print_info: PAD token        = 100256 '<|pad|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 100258 '<|fim_prefix|>'
print_info: FIM SUF token    = 100260 '<|fim_suffix|>'
print_info: FIM MID token    = 100259 '<|fim_middle|>'
print_info: FIM PAD token    = 100261 '<|fim_pad|>'
print_info: EOG token        = 100257 '<|end_of_text|>'
print_info: EOG token        = 100261 '<|fim_pad|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = false)
ggml_backend_vk_get_device_memory called: uuid 8680a646-0c00-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000013502
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: NVIDIA GeForce RTX 3050 Ti Laptop GPU, LUID: 0x000000000001386A, Dedicated: 3.87 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Intel(R) Iris(R) Xe Graphics, LUID: 0x0000000000013502, Dedicated: 0.12 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x00000000000137DE, Dedicated: 0.00 GB, Shared: 7.85 GB
Integrated GPU (Intel(R) Iris(R) Xe Graphics) with LUID 0x0000000000013502 detected. Shared Total: 8433055744.00 bytes (7.85 GB), Shared Usage: 2856886272.00 bytes (2.66 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 5710387200 total: 8567273472
ggml_backend_vk_get_device_memory called: uuid a4f6355b-902f-14e3-2b28-2189eb9ad638
ggml_backend_vk_get_device_memory called: luid 0x000000000001386a
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: NVIDIA GeForce RTX 3050 Ti Laptop GPU, LUID: 0x000000000001386A, Dedicated: 3.87 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Intel(R) Iris(R) Xe Graphics, LUID: 0x0000000000013502, Dedicated: 0.12 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x00000000000137DE, Dedicated: 0.00 GB, Shared: 7.85 GB
Discrete GPU (NVIDIA GeForce RTX 3050 Ti Laptop GPU) with LUID 0x000000000001386a detected. Dedicated Total: 4157603840.00 bytes (3.87 GB), Dedicated Usage: 9768960.00 bytes (0.01 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 4147834880 total: 4157603840
load_tensors: offloading 40 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 41/41 layers to GPU
load_tensors:      Vulkan1 model buffer size =  4031.55 MiB
load_tensors:  Vulkan_Host model buffer size =   120.59 MiB
ggml_backend_vk_get_device_memory called: uuid 8680a646-0c00-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000013502
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: NVIDIA GeForce RTX 3050 Ti Laptop GPU, LUID: 0x000000000001386A, Dedicated: 3.87 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Intel(R) Iris(R) Xe Graphics, LUID: 0x0000000000013502, Dedicated: 0.12 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x00000000000137DE, Dedicated: 0.00 GB, Shared: 7.85 GB
Integrated GPU (Intel(R) Iris(R) Xe Graphics) with LUID 0x0000000000013502 detected. Shared Total: 8433055744.00 bytes (7.85 GB), Shared Usage: 6038458368.00 bytes (5.62 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 2528815104 total: 8567273472
time=2026-03-12T21:48:42.270+01:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server not responding"
ggml_vulkan: Memory allocation of size 6144 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
Exception 0xe06d7363 0x19930520 0xe24636efb8 0x7ffe1d79055c
PC=0x7ffe1d79055c
signal arrived during external code execution

runtime.cgocall(0x7ff6fdf10600, 0xc00038fb58)
        runtime/cgocall.go:167 +0x3e fp=0xc00038fb30 sp=0xc00038fac8 pc=0x7ff6fd02243e
github.com/ollama/ollama/llama._Cfunc_llama_model_load_from_file(0x20dada5a720, {0xc000368a10, 0x0, 0x29, 0x1, 0x0, 0xc000334010, 0x7ff6fdf0fd00, 0xc000368a08, 0x0, ...})
        _cgo_gotypes.go:902 +0x51 fp=0xc00038fb58 sp=0xc00038fb30 pc=0x7ff6fd4d4bb1
github.com/ollama/ollama/llama.LoadModelFromFile.func1(...)
        github.com/ollama/ollama/llama/llama.go:308
github.com/ollama/ollama/llama.LoadModelFromFile({0xc0000aa0e0, 0x6b}, {{0xc000334018, 0x1, 0x1}, 0x29, 0x0, 0x0, {0xc000334010, 0x1, ...}, ...})
        github.com/ollama/ollama/llama/llama.go:308 +0x57f fp=0xc00038fda0 sp=0xc00038fb58 pc=0x7ff6fd4d81df
github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc000130280, {{0xc000334018, 0x1, 0x1}, 0x29, 0x0, 0x0, {0xc000334010, 0x1, 0x2}, ...}, ...)
        github.com/ollama/ollama/runner/llamarunner/runner.go:841 +0x9e fp=0xc00038fee8 sp=0xc00038fda0 pc=0x7ff6fd58f3de
github.com/ollama/ollama/runner/llamarunner.(*Server).load.gowrap2()
        github.com/ollama/ollama/runner/llamarunner/runner.go:934 +0x114 fp=0xc00038ffe0 sp=0xc00038fee8 pc=0x7ff6fd5906d4
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00038ffe8 sp=0xc00038ffe0 pc=0x7ff6fd02d9a1
created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 41
        github.com/ollama/ollama/runner/llamarunner/runner.go:934 +0x889

goroutine 1 gp=0xc0000021c0 m=nil [IO wait]:
runtime.gopark(0x7ff6fd02f1a0?, 0x7ff6ff2642e0?, 0x20?, 0xa0?, 0xc00013a0cc?)
        runtime/proc.go:435 +0xce fp=0xc000125630 sp=0xc000125610 pc=0x7ff6fd02598e
runtime.netpollblock(0x5bc?, 0xfcfc0406?, 0xf6?)
        runtime/netpoll.go:575 +0xf7 fp=0xc000125668 sp=0xc000125630 pc=0x7ff6fcfebdf7
internal/poll.runtime_pollWait(0x20d62200e70, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc000125688 sp=0xc000125668 pc=0x7ff6fd024b25
internal/poll.(*pollDesc).wait(0x7ff6fd0ba953?, 0x0?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0001256b0 sp=0xc000125688 pc=0x7ff6fd0bbf47
internal/poll.execIO(0xc00013a020, 0xc000125758)
        internal/poll/fd_windows.go:177 +0x105 fp=0xc000125728 sp=0xc0001256b0 pc=0x7ff6fd0bd3a5
internal/poll.(*FD).acceptOne(0xc00013a008, 0x5c4, {0xc0001420f0?, 0xc0001257b8?, 0x7ff6fd0c5065?}, 0xc0001257ec?)
        internal/poll/fd_windows.go:946 +0x65 fp=0xc000125788 sp=0xc000125728 pc=0x7ff6fd0c1925
internal/poll.(*FD).Accept(0xc00013a008, 0xc000125938)
        internal/poll/fd_windows.go:980 +0x1b6 fp=0xc000125840 sp=0xc000125788 pc=0x7ff6fd0c1c56
net.(*netFD).accept(0xc00013a008)
        net/fd_windows.go:182 +0x4b fp=0xc000125958 sp=0xc000125840 pc=0x7ff6fd13358b
net.(*TCPListener).accept(0xc0000a7600)
        net/tcpsock_posix.go:159 +0x1b fp=0xc0001259a8 sp=0xc000125958 pc=0x7ff6fd149b3b
net.(*TCPListener).Accept(0xc0000a7600)
        net/tcpsock.go:380 +0x30 fp=0xc0001259d8 sp=0xc0001259a8 pc=0x7ff6fd1488f0
net/http.(*onceCloseListener).Accept(0xc000140090?)
        <autogenerated>:1 +0x24 fp=0xc0001259f0 sp=0xc0001259d8 pc=0x7ff6fd361fe4
net/http.(*Server).Serve(0xc0001de700, {0x7ff6fe79d5b0, 0xc0000a7600})
        net/http/server.go:3424 +0x30c fp=0xc000125b20 sp=0xc0001259f0 pc=0x7ff6fd3398ac
github.com/ollama/ollama/runner/llamarunner.Execute({0xc0000c2020, 0x4, 0x6})
        github.com/ollama/ollama/runner/llamarunner/runner.go:1002 +0x8f5 fp=0xc000125cf0 sp=0xc000125b20 pc=0x7ff6fd591095
github.com/ollama/ollama/runner.Execute({0xc0000c2010?, 0x0?, 0x0?})
        github.com/ollama/ollama/runner/runner.go:25 +0x1a5 fp=0xc000125d30 sp=0xc000125cf0 pc=0x7ff6fd672e25
github.com/ollama/ollama/cmd.NewCLI.func3(0xc00063f200?, {0x7ff6fe5747ef?, 0x4?, 0x7ff6fe5747f3?})
        github.com/ollama/ollama/cmd/cmd.go:2271 +0x45 fp=0xc000125d58 sp=0xc000125d30 pc=0x7ff6fde9ee65
github.com/spf13/cobra.(*Command).execute(0xc0002f5b08, {0xc00062b380, 0x4, 0x4})
        github.com/spf13/cobra@v1.7.0/command.go:940 +0x85c fp=0xc000125e78 sp=0xc000125d58 pc=0x7ff6fd1ae75c
github.com/spf13/cobra.(*Command).ExecuteC(0xc00068ef08)
        github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc000125f30 sp=0xc000125e78 pc=0x7ff6fd1aefa5
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
        github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000125f50 sp=0xc000125f30 pc=0x7ff6fdea130d
runtime.main()
        runtime/proc.go:283 +0x27d fp=0xc000125fe0 sp=0xc000125f50 pc=0x7ff6fcff4ddd
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000125fe8 sp=0xc000125fe0 pc=0x7ff6fd02d9a1

goroutine 2 gp=0xc0000028c0 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00008dfa8 sp=0xc00008df88 pc=0x7ff6fd02598e
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.forcegchelper()
        runtime/proc.go:348 +0xb8 fp=0xc00008dfe0 sp=0xc00008dfa8 pc=0x7ff6fcff50f8
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00008dfe8 sp=0xc00008dfe0 pc=0x7ff6fd02d9a1
created by runtime.init.7 in goroutine 1
        runtime/proc.go:336 +0x1a

goroutine 3 gp=0xc000002c40 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00008ff80 sp=0xc00008ff60 pc=0x7ff6fd02598e
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.bgsweep(0xc00009c000)
        runtime/mgcsweep.go:316 +0xdf fp=0xc00008ffc8 sp=0xc00008ff80 pc=0x7ff6fcfddebf
runtime.gcenable.gowrap1()
        runtime/mgc.go:204 +0x25 fp=0xc00008ffe0 sp=0xc00008ffc8 pc=0x7ff6fcfd2285
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00008ffe8 sp=0xc00008ffe0 pc=0x7ff6fd02d9a1
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000002e00 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x7ff6fe787ac0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc0000a3f78 sp=0xc0000a3f58 pc=0x7ff6fd02598e
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.(*scavengerState).park(0x7ff6ff28e080)
        runtime/mgcscavenge.go:425 +0x49 fp=0xc0000a3fa8 sp=0xc0000a3f78 pc=0x7ff6fcfdb909
runtime.bgscavenge(0xc00009c000)
        runtime/mgcscavenge.go:658 +0x59 fp=0xc0000a3fc8 sp=0xc0000a3fa8 pc=0x7ff6fcfdbe99
runtime.gcenable.gowrap2()
        runtime/mgc.go:205 +0x25 fp=0xc0000a3fe0 sp=0xc0000a3fc8 pc=0x7ff6fcfd2225
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000a3fe8 sp=0xc0000a3fe0 pc=0x7ff6fd02d9a1
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000003340 m=nil [finalizer wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc0000a5e30 sp=0xc0000a5e10 pc=0x7ff6fd02598e
runtime.runfinq()
        runtime/mfinal.go:196 +0x107 fp=0xc0000a5fe0 sp=0xc0000a5e30 pc=0x7ff6fcfd1207
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000a5fe8 sp=0xc0000a5fe0 pc=0x7ff6fd02d9a1
created by runtime.createfing in goroutine 1
        runtime/mfinal.go:166 +0x3d

goroutine 6 gp=0xc000003dc0 m=nil [chan receive]:
runtime.gopark(0xc000213540?, 0xc000518018?, 0x60?, 0x1f?, 0x7ff6fd11c1a8?)
        runtime/proc.go:435 +0xce fp=0xc000091f18 sp=0xc000091ef8 pc=0x7ff6fd02598e
runtime.chanrecv(0xc0000aa3f0, 0x0, 0x1)
        runtime/chan.go:664 +0x445 fp=0xc000091f90 sp=0xc000091f18 pc=0x7ff6fcfc2d45
runtime.chanrecv1(0x7ff6fcff4f40?, 0xc000091f76?)
        runtime/chan.go:506 +0x12 fp=0xc000091fb8 sp=0xc000091f90 pc=0x7ff6fcfc28d2
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
        runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        runtime/mgc.go:1799 +0x2f fp=0xc000091fe0 sp=0xc000091fb8 pc=0x7ff6fcfd54af
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000091fe8 sp=0xc000091fe0 pc=0x7ff6fd02d9a1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        runtime/mgc.go:1794 +0x85

goroutine 7 gp=0xc000268540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00009ff38 sp=0xc00009ff18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00009ffc8 sp=0xc00009ff38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00009ffe0 sp=0xc00009ffc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00009ffe8 sp=0xc00009ffe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 8 gp=0xc000268700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc0000a1f38 sp=0xc0000a1f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc0000a1fc8 sp=0xc0000a1f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc0000a1fe0 sp=0xc0000a1fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000a1fe8 sp=0xc0000a1fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 9 gp=0xc0002688c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000473f38 sp=0xc000473f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000473fc8 sp=0xc000473f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000473fe0 sp=0xc000473fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000473fe8 sp=0xc000473fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 10 gp=0xc000268a80 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000475f38 sp=0xc000475f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000475fc8 sp=0xc000475f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000475fe0 sp=0xc000475fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000475fe8 sp=0xc000475fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 18 gp=0xc000484000 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00046ff38 sp=0xc00046ff18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00046ffc8 sp=0xc00046ff38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00046ffe0 sp=0xc00046ffc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00046ffe8 sp=0xc00046ffe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 34 gp=0xc0001061c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000113f38 sp=0xc000113f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000113fc8 sp=0xc000113f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000113fe0 sp=0xc000113fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000113fe8 sp=0xc000113fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 19 gp=0xc0004841c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000471f38 sp=0xc000471f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000471fc8 sp=0xc000471f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000471fe0 sp=0xc000471fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000471fe8 sp=0xc000471fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 11 gp=0xc000268c40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00010ff38 sp=0xc00010ff18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00010ffc8 sp=0xc00010ff38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00010ffe0 sp=0xc00010ffc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00010ffe8 sp=0xc00010ffe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 12 gp=0xc000268e00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000111f38 sp=0xc000111f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000111fc8 sp=0xc000111f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000111fe0 sp=0xc000111fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000111fe8 sp=0xc000111fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 13 gp=0xc000268fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00047bf38 sp=0xc00047bf18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00047bfc8 sp=0xc00047bf38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00047bfe0 sp=0xc00047bfc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00047bfe8 sp=0xc00047bfe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 20 gp=0xc000484380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000477f38 sp=0xc000477f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000477fc8 sp=0xc000477f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000477fe0 sp=0xc000477fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000477fe8 sp=0xc000477fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 35 gp=0xc000106380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000115f38 sp=0xc000115f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000115fc8 sp=0xc000115f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000115fe0 sp=0xc000115fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000115fe8 sp=0xc000115fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 36 gp=0xc000106540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00011bf38 sp=0xc00011bf18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00011bfc8 sp=0xc00011bf38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00011bfe0 sp=0xc00011bfc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00011bfe8 sp=0xc00011bfe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 37 gp=0xc000106700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00011df38 sp=0xc00011df18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00011dfc8 sp=0xc00011df38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00011dfe0 sp=0xc00011dfc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00011dfe8 sp=0xc00011dfe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 38 gp=0xc0001068c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000117f38 sp=0xc000117f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000117fc8 sp=0xc000117f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000117fe0 sp=0xc000117fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000117fe8 sp=0xc000117fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 39 gp=0xc000106a80 m=nil [GC worker (idle)]:
runtime.gopark(0x2d24f5154b30?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000119f38 sp=0xc000119f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000119fc8 sp=0xc000119f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000119fe0 sp=0xc000119fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000119fe8 sp=0xc000119fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 21 gp=0xc000484540 m=nil [GC worker (idle)]:
runtime.gopark(0x2d24f5154b30?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000479f38 sp=0xc000479f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000479fc8 sp=0xc000479f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000479fe0 sp=0xc000479fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000479fe8 sp=0xc000479fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 14 gp=0xc000269180 m=nil [GC worker (idle)]:
runtime.gopark(0x7ff6ff2e10e0?, 0x1?, 0x18?, 0x30?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00047df38 sp=0xc00047df18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00047dfc8 sp=0xc00047df38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00047dfe0 sp=0xc00047dfc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00047dfe8 sp=0xc00047dfe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 15 gp=0xc000269340 m=nil [GC worker (idle)]:
runtime.gopark(0x7ff6ff2e10e0?, 0x1?, 0x18?, 0x30?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000505f38 sp=0xc000505f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000505fc8 sp=0xc000505f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000505fe0 sp=0xc000505fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000505fe8 sp=0xc000505fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 16 gp=0xc000269500 m=nil [GC worker (idle)]:
runtime.gopark(0x7ff6ff2e10e0?, 0x1?, 0x18?, 0x30?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000507f38 sp=0xc000507f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000507fc8 sp=0xc000507f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000507fe0 sp=0xc000507fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000507fe8 sp=0xc000507fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 40 gp=0xc000484700 m=nil [sync.WaitGroup.Wait]:
runtime.gopark(0x0?, 0x0?, 0x60?, 0xfe?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000501e20 sp=0xc000501e00 pc=0x7ff6fd02598e
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.semacquire1(0xc0001302a0, 0x0, 0x1, 0x0, 0x18)
        runtime/sema.go:188 +0x22f fp=0xc000501e88 sp=0xc000501e20 pc=0x7ff6fd00750f
sync.runtime_SemacquireWaitGroup(0x0?)
        runtime/sema.go:110 +0x25 fp=0xc000501ec0 sp=0xc000501e88 pc=0x7ff6fd026f85
sync.(*WaitGroup).Wait(0x0?)
        sync/waitgroup.go:118 +0x48 fp=0xc000501ee8 sp=0xc000501ec0 pc=0x7ff6fd03b988
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc000130280, {0x7ff6fe79fed0, 0xc0001340f0})
        github.com/ollama/ollama/runner/llamarunner/runner.go:360 +0x4b fp=0xc000501fb8 sp=0xc000501ee8 pc=0x7ff6fd58bdab
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
        github.com/ollama/ollama/runner/llamarunner/runner.go:981 +0x28 fp=0xc000501fe0 sp=0xc000501fb8 pc=0x7ff6fd591308
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000501fe8 sp=0xc000501fe0 pc=0x7ff6fd02d9a1
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
        github.com/ollama/ollama/runner/llamarunner/runner.go:981 +0x4c5
rax     0x0
rbx     0x7ffd0d923c60
rcx     0x7ffd0d907da0
rdx     0x7ffd0a8ca55c
rdi     0x19930520
rsi     0xe24636efb8
rbp     0xe24636ef70
rsp     0xe24636a0f0
r8      0x0
r9      0x0
r10     0xfffffffffffffffe
r11     0x0
r12     0xe24636a310
r13     0x0
r14     0xe24636b198
r15     0x0
rip     0x7ffe1d79055c
rflags  0x202
cs      0x33
fs      0x53
gs      0x2b

RAW_BUFFERClick to expand / collapse

What is the issue?

I'm trying to use one intel GPU at the same time with and nvidia GPU. but ollama only is working with the intel GPU (with error like bellow), but doesn't use the nvidia GPU. I've set this env vars: OLLAMA_SCHED_SPREAD 1 OLLAMA_VULKAN 1 OLLAMA_GPU_OVERHEAD 0 OLLAMA_NEW_ENGINE 1

Relevant log output

llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = granitehybrid
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Granite 4.0 H Tiny
llama_model_loader: - kv   3:                         general.size_label str              = 64x994M
llama_model_loader: - kv   4:                            general.license str              = apache-2.0
llama_model_loader: - kv   5:                               general.tags arr[str,2]       = ["language", "granite-4.0"]
llama_model_loader: - kv   6:                  granitehybrid.block_count u32              = 40
llama_model_loader: - kv   7:               granitehybrid.context_length u32              = 1048576
llama_model_loader: - kv   8:             granitehybrid.embedding_length u32              = 1536
llama_model_loader: - kv   9:          granitehybrid.feed_forward_length u32              = 512
llama_model_loader: - kv  10:         granitehybrid.attention.head_count u32              = 12
llama_model_loader: - kv  11:      granitehybrid.attention.head_count_kv arr[i32,40]      = [0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, ...
llama_model_loader: - kv  12:               granitehybrid.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  13: granitehybrid.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  14:                 granitehybrid.expert_count u32              = 64
llama_model_loader: - kv  15:            granitehybrid.expert_used_count u32              = 6
llama_model_loader: - kv  16:                   granitehybrid.vocab_size u32              = 100352
llama_model_loader: - kv  17:         granitehybrid.rope.dimension_count u32              = 128
llama_model_loader: - kv  18:              granitehybrid.attention.scale f32              = 0.007813
llama_model_loader: - kv  19:              granitehybrid.embedding_scale f32              = 12.000000
llama_model_loader: - kv  20:               granitehybrid.residual_scale f32              = 0.220000
llama_model_loader: - kv  21:                  granitehybrid.logit_scale f32              = 6.000000
llama_model_loader: - kv  22: granitehybrid.expert_shared_feed_forward_length u32              = 1024
llama_model_loader: - kv  23:              granitehybrid.ssm.conv_kernel u32              = 4
llama_model_loader: - kv  24:               granitehybrid.ssm.state_size u32              = 128
llama_model_loader: - kv  25:              granitehybrid.ssm.group_count u32              = 1
llama_model_loader: - kv  26:               granitehybrid.ssm.inner_size u32              = 3072
llama_model_loader: - kv  27:           granitehybrid.ssm.time_step_rank u32              = 48
llama_model_loader: - kv  28:       granitehybrid.rope.scaling.finetuned bool             = false
llama_model_loader: - kv  29:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  30:                         tokenizer.ggml.pre str              = dbrx
llama_model_loader: - kv  31:                      tokenizer.ggml.tokens arr[str,100352]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  32:                  tokenizer.ggml.token_type arr[i32,100352]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  33:                      tokenizer.ggml.merges arr[str,100000]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  34:                tokenizer.ggml.bos_token_id u32              = 100257
llama_model_loader: - kv  35:                tokenizer.ggml.eos_token_id u32              = 100257
llama_model_loader: - kv  36:            tokenizer.ggml.unknown_token_id u32              = 100269
llama_model_loader: - kv  37:            tokenizer.ggml.padding_token_id u32              = 100256
llama_model_loader: - kv  38:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  39:                    tokenizer.chat_template str              = {%- set tools_system_message_prefix =...
llama_model_loader: - kv  40:               general.quantization_version u32              = 2
llama_model_loader: - kv  41:                          general.file_type u32              = 15
llama_model_loader: - type  f32:  337 tensors
llama_model_loader: - type q4_K:  286 tensors
llama_model_loader: - type q6_K:   43 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 3.94 GiB (4.87 BPW)
load: printing all EOG tokens:
load:   - 100257 ('<|end_of_text|>')
load:   - 100261 ('<|fim_pad|>')
load: special tokens cache size = 96
load: token to piece cache size = 0.6152 MB
print_info: arch             = granitehybrid
print_info: vocab_only       = 0
print_info: no_alloc         = 0
print_info: n_ctx_train      = 1048576
print_info: n_embd           = 1536
print_info: n_embd_inp       = 1536
print_info: n_layer          = 40
print_info: n_head           = 12
print_info: n_head_kv        = [0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0]
print_info: n_rot            = 128
print_info: n_swa            = 0
print_info: is_swa_any       = 0
print_info: n_embd_head_k    = 128
print_info: n_embd_head_v    = 128
print_info: n_gqa            = [0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0]
print_info: n_embd_k_gqa     = [0, 0, 0, 0, 0, 512, 0, 0, 0, 0, 0, 0, 0, 0, 0, 512, 0, 0, 0, 0, 0, 0, 0, 0, 0, 512, 0, 0, 0, 0, 0, 0, 0, 0, 0, 512, 0, 0, 0, 0]
print_info: n_embd_v_gqa     = [0, 0, 0, 0, 0, 512, 0, 0, 0, 0, 0, 0, 0, 0, 0, 512, 0, 0, 0, 0, 0, 0, 0, 0, 0, 512, 0, 0, 0, 0, 0, 0, 0, 0, 0, 512, 0, 0, 0, 0]
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 6.0e+00
print_info: f_attn_scale     = 7.8e-03
print_info: n_ff             = 512
print_info: n_expert         = 64
print_info: n_expert_used    = 6
print_info: n_expert_groups  = 0
print_info: n_group_used     = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 0
print_info: rope scaling     = linear
print_info: freq_base_train  = 10000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 1048576
print_info: rope_yarn_log_mul= 0.0000
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 4
print_info: ssm_d_inner      = 3072
print_info: ssm_d_state      = 128
print_info: ssm_dt_rank      = 48
print_info: ssm_n_group      = 1
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = 1B
print_info: model params     = 6.94 B
print_info: general.name     = Granite 4.0 H Tiny
print_info: f_embedding_scale = 12.000000
print_info: f_residual_scale  = 0.220000
print_info: f_attention_scale = 0.007813
print_info: n_ff_shexp        = 1024
print_info: vocab type       = BPE
print_info: n_vocab          = 100352
print_info: n_merges         = 100000
print_info: BOS token        = 100257 '<|end_of_text|>'
print_info: EOS token        = 100257 '<|end_of_text|>'
print_info: EOT token        = 100257 '<|end_of_text|>'
print_info: UNK token        = 100269 '<|unk|>'
print_info: PAD token        = 100256 '<|pad|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 100258 '<|fim_prefix|>'
print_info: FIM SUF token    = 100260 '<|fim_suffix|>'
print_info: FIM MID token    = 100259 '<|fim_middle|>'
print_info: FIM PAD token    = 100261 '<|fim_pad|>'
print_info: EOG token        = 100257 '<|end_of_text|>'
print_info: EOG token        = 100261 '<|fim_pad|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = false)
ggml_backend_vk_get_device_memory called: uuid 8680a646-0c00-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000013502
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: NVIDIA GeForce RTX 3050 Ti Laptop GPU, LUID: 0x000000000001386A, Dedicated: 3.87 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Intel(R) Iris(R) Xe Graphics, LUID: 0x0000000000013502, Dedicated: 0.12 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x00000000000137DE, Dedicated: 0.00 GB, Shared: 7.85 GB
Integrated GPU (Intel(R) Iris(R) Xe Graphics) with LUID 0x0000000000013502 detected. Shared Total: 8433055744.00 bytes (7.85 GB), Shared Usage: 2856886272.00 bytes (2.66 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 5710387200 total: 8567273472
ggml_backend_vk_get_device_memory called: uuid a4f6355b-902f-14e3-2b28-2189eb9ad638
ggml_backend_vk_get_device_memory called: luid 0x000000000001386a
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: NVIDIA GeForce RTX 3050 Ti Laptop GPU, LUID: 0x000000000001386A, Dedicated: 3.87 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Intel(R) Iris(R) Xe Graphics, LUID: 0x0000000000013502, Dedicated: 0.12 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x00000000000137DE, Dedicated: 0.00 GB, Shared: 7.85 GB
Discrete GPU (NVIDIA GeForce RTX 3050 Ti Laptop GPU) with LUID 0x000000000001386a detected. Dedicated Total: 4157603840.00 bytes (3.87 GB), Dedicated Usage: 9768960.00 bytes (0.01 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 4147834880 total: 4157603840
load_tensors: offloading 40 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 41/41 layers to GPU
load_tensors:      Vulkan1 model buffer size =  4031.55 MiB
load_tensors:  Vulkan_Host model buffer size =   120.59 MiB
ggml_backend_vk_get_device_memory called: uuid 8680a646-0c00-0000-0000-000000000000
ggml_backend_vk_get_device_memory called: luid 0x0000000000013502
ggml_dxgi_pdh_init called
DXGI + PDH Initialized. Getting GPU free memory info
[DXGI] Adapter Description: NVIDIA GeForce RTX 3050 Ti Laptop GPU, LUID: 0x000000000001386A, Dedicated: 3.87 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Intel(R) Iris(R) Xe Graphics, LUID: 0x0000000000013502, Dedicated: 0.12 GB, Shared: 7.85 GB
[DXGI] Adapter Description: Microsoft Basic Render Driver, LUID: 0x00000000000137DE, Dedicated: 0.00 GB, Shared: 7.85 GB
Integrated GPU (Intel(R) Iris(R) Xe Graphics) with LUID 0x0000000000013502 detected. Shared Total: 8433055744.00 bytes (7.85 GB), Shared Usage: 6038458368.00 bytes (5.62 GB), Dedicated Total: 134217728.00 bytes (0.12 GB), Dedicated Usage: 0.00 bytes (0.00 GB)
ggml_backend_vk_get_device_memory utilizing DXGI + PDH memory reporting free: 2528815104 total: 8567273472
time=2026-03-12T21:48:42.270+01:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server not responding"
ggml_vulkan: Memory allocation of size 6144 failed.
ggml_vulkan: vk::Device::allocateMemory: ErrorOutOfDeviceMemory
Exception 0xe06d7363 0x19930520 0xe24636efb8 0x7ffe1d79055c
PC=0x7ffe1d79055c
signal arrived during external code execution

runtime.cgocall(0x7ff6fdf10600, 0xc00038fb58)
        runtime/cgocall.go:167 +0x3e fp=0xc00038fb30 sp=0xc00038fac8 pc=0x7ff6fd02243e
github.com/ollama/ollama/llama._Cfunc_llama_model_load_from_file(0x20dada5a720, {0xc000368a10, 0x0, 0x29, 0x1, 0x0, 0xc000334010, 0x7ff6fdf0fd00, 0xc000368a08, 0x0, ...})
        _cgo_gotypes.go:902 +0x51 fp=0xc00038fb58 sp=0xc00038fb30 pc=0x7ff6fd4d4bb1
github.com/ollama/ollama/llama.LoadModelFromFile.func1(...)
        github.com/ollama/ollama/llama/llama.go:308
github.com/ollama/ollama/llama.LoadModelFromFile({0xc0000aa0e0, 0x6b}, {{0xc000334018, 0x1, 0x1}, 0x29, 0x0, 0x0, {0xc000334010, 0x1, ...}, ...})
        github.com/ollama/ollama/llama/llama.go:308 +0x57f fp=0xc00038fda0 sp=0xc00038fb58 pc=0x7ff6fd4d81df
github.com/ollama/ollama/runner/llamarunner.(*Server).loadModel(0xc000130280, {{0xc000334018, 0x1, 0x1}, 0x29, 0x0, 0x0, {0xc000334010, 0x1, 0x2}, ...}, ...)
        github.com/ollama/ollama/runner/llamarunner/runner.go:841 +0x9e fp=0xc00038fee8 sp=0xc00038fda0 pc=0x7ff6fd58f3de
github.com/ollama/ollama/runner/llamarunner.(*Server).load.gowrap2()
        github.com/ollama/ollama/runner/llamarunner/runner.go:934 +0x114 fp=0xc00038ffe0 sp=0xc00038fee8 pc=0x7ff6fd5906d4
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00038ffe8 sp=0xc00038ffe0 pc=0x7ff6fd02d9a1
created by github.com/ollama/ollama/runner/llamarunner.(*Server).load in goroutine 41
        github.com/ollama/ollama/runner/llamarunner/runner.go:934 +0x889

goroutine 1 gp=0xc0000021c0 m=nil [IO wait]:
runtime.gopark(0x7ff6fd02f1a0?, 0x7ff6ff2642e0?, 0x20?, 0xa0?, 0xc00013a0cc?)
        runtime/proc.go:435 +0xce fp=0xc000125630 sp=0xc000125610 pc=0x7ff6fd02598e
runtime.netpollblock(0x5bc?, 0xfcfc0406?, 0xf6?)
        runtime/netpoll.go:575 +0xf7 fp=0xc000125668 sp=0xc000125630 pc=0x7ff6fcfebdf7
internal/poll.runtime_pollWait(0x20d62200e70, 0x72)
        runtime/netpoll.go:351 +0x85 fp=0xc000125688 sp=0xc000125668 pc=0x7ff6fd024b25
internal/poll.(*pollDesc).wait(0x7ff6fd0ba953?, 0x0?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0001256b0 sp=0xc000125688 pc=0x7ff6fd0bbf47
internal/poll.execIO(0xc00013a020, 0xc000125758)
        internal/poll/fd_windows.go:177 +0x105 fp=0xc000125728 sp=0xc0001256b0 pc=0x7ff6fd0bd3a5
internal/poll.(*FD).acceptOne(0xc00013a008, 0x5c4, {0xc0001420f0?, 0xc0001257b8?, 0x7ff6fd0c5065?}, 0xc0001257ec?)
        internal/poll/fd_windows.go:946 +0x65 fp=0xc000125788 sp=0xc000125728 pc=0x7ff6fd0c1925
internal/poll.(*FD).Accept(0xc00013a008, 0xc000125938)
        internal/poll/fd_windows.go:980 +0x1b6 fp=0xc000125840 sp=0xc000125788 pc=0x7ff6fd0c1c56
net.(*netFD).accept(0xc00013a008)
        net/fd_windows.go:182 +0x4b fp=0xc000125958 sp=0xc000125840 pc=0x7ff6fd13358b
net.(*TCPListener).accept(0xc0000a7600)
        net/tcpsock_posix.go:159 +0x1b fp=0xc0001259a8 sp=0xc000125958 pc=0x7ff6fd149b3b
net.(*TCPListener).Accept(0xc0000a7600)
        net/tcpsock.go:380 +0x30 fp=0xc0001259d8 sp=0xc0001259a8 pc=0x7ff6fd1488f0
net/http.(*onceCloseListener).Accept(0xc000140090?)
        <autogenerated>:1 +0x24 fp=0xc0001259f0 sp=0xc0001259d8 pc=0x7ff6fd361fe4
net/http.(*Server).Serve(0xc0001de700, {0x7ff6fe79d5b0, 0xc0000a7600})
        net/http/server.go:3424 +0x30c fp=0xc000125b20 sp=0xc0001259f0 pc=0x7ff6fd3398ac
github.com/ollama/ollama/runner/llamarunner.Execute({0xc0000c2020, 0x4, 0x6})
        github.com/ollama/ollama/runner/llamarunner/runner.go:1002 +0x8f5 fp=0xc000125cf0 sp=0xc000125b20 pc=0x7ff6fd591095
github.com/ollama/ollama/runner.Execute({0xc0000c2010?, 0x0?, 0x0?})
        github.com/ollama/ollama/runner/runner.go:25 +0x1a5 fp=0xc000125d30 sp=0xc000125cf0 pc=0x7ff6fd672e25
github.com/ollama/ollama/cmd.NewCLI.func3(0xc00063f200?, {0x7ff6fe5747ef?, 0x4?, 0x7ff6fe5747f3?})
        github.com/ollama/ollama/cmd/cmd.go:2271 +0x45 fp=0xc000125d58 sp=0xc000125d30 pc=0x7ff6fde9ee65
github.com/spf13/cobra.(*Command).execute(0xc0002f5b08, {0xc00062b380, 0x4, 0x4})
        github.com/spf13/[email protected]/command.go:940 +0x85c fp=0xc000125e78 sp=0xc000125d58 pc=0x7ff6fd1ae75c
github.com/spf13/cobra.(*Command).ExecuteC(0xc00068ef08)
        github.com/spf13/[email protected]/command.go:1068 +0x3a5 fp=0xc000125f30 sp=0xc000125e78 pc=0x7ff6fd1aefa5
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/[email protected]/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        github.com/spf13/[email protected]/command.go:985
main.main()
        github.com/ollama/ollama/main.go:12 +0x4d fp=0xc000125f50 sp=0xc000125f30 pc=0x7ff6fdea130d
runtime.main()
        runtime/proc.go:283 +0x27d fp=0xc000125fe0 sp=0xc000125f50 pc=0x7ff6fcff4ddd
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000125fe8 sp=0xc000125fe0 pc=0x7ff6fd02d9a1

goroutine 2 gp=0xc0000028c0 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00008dfa8 sp=0xc00008df88 pc=0x7ff6fd02598e
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.forcegchelper()
        runtime/proc.go:348 +0xb8 fp=0xc00008dfe0 sp=0xc00008dfa8 pc=0x7ff6fcff50f8
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00008dfe8 sp=0xc00008dfe0 pc=0x7ff6fd02d9a1
created by runtime.init.7 in goroutine 1
        runtime/proc.go:336 +0x1a

goroutine 3 gp=0xc000002c40 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00008ff80 sp=0xc00008ff60 pc=0x7ff6fd02598e
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.bgsweep(0xc00009c000)
        runtime/mgcsweep.go:316 +0xdf fp=0xc00008ffc8 sp=0xc00008ff80 pc=0x7ff6fcfddebf
runtime.gcenable.gowrap1()
        runtime/mgc.go:204 +0x25 fp=0xc00008ffe0 sp=0xc00008ffc8 pc=0x7ff6fcfd2285
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00008ffe8 sp=0xc00008ffe0 pc=0x7ff6fd02d9a1
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:204 +0x66

goroutine 4 gp=0xc000002e00 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x7ff6fe787ac0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc0000a3f78 sp=0xc0000a3f58 pc=0x7ff6fd02598e
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.(*scavengerState).park(0x7ff6ff28e080)
        runtime/mgcscavenge.go:425 +0x49 fp=0xc0000a3fa8 sp=0xc0000a3f78 pc=0x7ff6fcfdb909
runtime.bgscavenge(0xc00009c000)
        runtime/mgcscavenge.go:658 +0x59 fp=0xc0000a3fc8 sp=0xc0000a3fa8 pc=0x7ff6fcfdbe99
runtime.gcenable.gowrap2()
        runtime/mgc.go:205 +0x25 fp=0xc0000a3fe0 sp=0xc0000a3fc8 pc=0x7ff6fcfd2225
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000a3fe8 sp=0xc0000a3fe0 pc=0x7ff6fd02d9a1
created by runtime.gcenable in goroutine 1
        runtime/mgc.go:205 +0xa5

goroutine 5 gp=0xc000003340 m=nil [finalizer wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc0000a5e30 sp=0xc0000a5e10 pc=0x7ff6fd02598e
runtime.runfinq()
        runtime/mfinal.go:196 +0x107 fp=0xc0000a5fe0 sp=0xc0000a5e30 pc=0x7ff6fcfd1207
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000a5fe8 sp=0xc0000a5fe0 pc=0x7ff6fd02d9a1
created by runtime.createfing in goroutine 1
        runtime/mfinal.go:166 +0x3d

goroutine 6 gp=0xc000003dc0 m=nil [chan receive]:
runtime.gopark(0xc000213540?, 0xc000518018?, 0x60?, 0x1f?, 0x7ff6fd11c1a8?)
        runtime/proc.go:435 +0xce fp=0xc000091f18 sp=0xc000091ef8 pc=0x7ff6fd02598e
runtime.chanrecv(0xc0000aa3f0, 0x0, 0x1)
        runtime/chan.go:664 +0x445 fp=0xc000091f90 sp=0xc000091f18 pc=0x7ff6fcfc2d45
runtime.chanrecv1(0x7ff6fcff4f40?, 0xc000091f76?)
        runtime/chan.go:506 +0x12 fp=0xc000091fb8 sp=0xc000091f90 pc=0x7ff6fcfc28d2
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
        runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        runtime/mgc.go:1799 +0x2f fp=0xc000091fe0 sp=0xc000091fb8 pc=0x7ff6fcfd54af
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000091fe8 sp=0xc000091fe0 pc=0x7ff6fd02d9a1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        runtime/mgc.go:1794 +0x85

goroutine 7 gp=0xc000268540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00009ff38 sp=0xc00009ff18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00009ffc8 sp=0xc00009ff38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00009ffe0 sp=0xc00009ffc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00009ffe8 sp=0xc00009ffe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 8 gp=0xc000268700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc0000a1f38 sp=0xc0000a1f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc0000a1fc8 sp=0xc0000a1f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc0000a1fe0 sp=0xc0000a1fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc0000a1fe8 sp=0xc0000a1fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 9 gp=0xc0002688c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000473f38 sp=0xc000473f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000473fc8 sp=0xc000473f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000473fe0 sp=0xc000473fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000473fe8 sp=0xc000473fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 10 gp=0xc000268a80 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000475f38 sp=0xc000475f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000475fc8 sp=0xc000475f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000475fe0 sp=0xc000475fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000475fe8 sp=0xc000475fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 18 gp=0xc000484000 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00046ff38 sp=0xc00046ff18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00046ffc8 sp=0xc00046ff38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00046ffe0 sp=0xc00046ffc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00046ffe8 sp=0xc00046ffe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 34 gp=0xc0001061c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000113f38 sp=0xc000113f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000113fc8 sp=0xc000113f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000113fe0 sp=0xc000113fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000113fe8 sp=0xc000113fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 19 gp=0xc0004841c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000471f38 sp=0xc000471f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000471fc8 sp=0xc000471f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000471fe0 sp=0xc000471fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000471fe8 sp=0xc000471fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 11 gp=0xc000268c40 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00010ff38 sp=0xc00010ff18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00010ffc8 sp=0xc00010ff38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00010ffe0 sp=0xc00010ffc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00010ffe8 sp=0xc00010ffe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 12 gp=0xc000268e00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000111f38 sp=0xc000111f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000111fc8 sp=0xc000111f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000111fe0 sp=0xc000111fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000111fe8 sp=0xc000111fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 13 gp=0xc000268fc0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00047bf38 sp=0xc00047bf18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00047bfc8 sp=0xc00047bf38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00047bfe0 sp=0xc00047bfc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00047bfe8 sp=0xc00047bfe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 20 gp=0xc000484380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000477f38 sp=0xc000477f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000477fc8 sp=0xc000477f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000477fe0 sp=0xc000477fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000477fe8 sp=0xc000477fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 35 gp=0xc000106380 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000115f38 sp=0xc000115f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000115fc8 sp=0xc000115f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000115fe0 sp=0xc000115fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000115fe8 sp=0xc000115fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 36 gp=0xc000106540 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00011bf38 sp=0xc00011bf18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00011bfc8 sp=0xc00011bf38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00011bfe0 sp=0xc00011bfc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00011bfe8 sp=0xc00011bfe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 37 gp=0xc000106700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00011df38 sp=0xc00011df18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00011dfc8 sp=0xc00011df38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00011dfe0 sp=0xc00011dfc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00011dfe8 sp=0xc00011dfe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 38 gp=0xc0001068c0 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000117f38 sp=0xc000117f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000117fc8 sp=0xc000117f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000117fe0 sp=0xc000117fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000117fe8 sp=0xc000117fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 39 gp=0xc000106a80 m=nil [GC worker (idle)]:
runtime.gopark(0x2d24f5154b30?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000119f38 sp=0xc000119f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000119fc8 sp=0xc000119f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000119fe0 sp=0xc000119fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000119fe8 sp=0xc000119fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 21 gp=0xc000484540 m=nil [GC worker (idle)]:
runtime.gopark(0x2d24f5154b30?, 0x0?, 0x0?, 0x0?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000479f38 sp=0xc000479f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000479fc8 sp=0xc000479f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000479fe0 sp=0xc000479fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000479fe8 sp=0xc000479fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 14 gp=0xc000269180 m=nil [GC worker (idle)]:
runtime.gopark(0x7ff6ff2e10e0?, 0x1?, 0x18?, 0x30?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc00047df38 sp=0xc00047df18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc00047dfc8 sp=0xc00047df38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc00047dfe0 sp=0xc00047dfc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc00047dfe8 sp=0xc00047dfe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 15 gp=0xc000269340 m=nil [GC worker (idle)]:
runtime.gopark(0x7ff6ff2e10e0?, 0x1?, 0x18?, 0x30?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000505f38 sp=0xc000505f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000505fc8 sp=0xc000505f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000505fe0 sp=0xc000505fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000505fe8 sp=0xc000505fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 16 gp=0xc000269500 m=nil [GC worker (idle)]:
runtime.gopark(0x7ff6ff2e10e0?, 0x1?, 0x18?, 0x30?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000507f38 sp=0xc000507f18 pc=0x7ff6fd02598e
runtime.gcBgMarkWorker(0xc0000ab810)
        runtime/mgc.go:1423 +0xe9 fp=0xc000507fc8 sp=0xc000507f38 pc=0x7ff6fcfd47a9
runtime.gcBgMarkStartWorkers.gowrap1()
        runtime/mgc.go:1339 +0x25 fp=0xc000507fe0 sp=0xc000507fc8 pc=0x7ff6fcfd4685
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000507fe8 sp=0xc000507fe0 pc=0x7ff6fd02d9a1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        runtime/mgc.go:1339 +0x105

goroutine 40 gp=0xc000484700 m=nil [sync.WaitGroup.Wait]:
runtime.gopark(0x0?, 0x0?, 0x60?, 0xfe?, 0x0?)
        runtime/proc.go:435 +0xce fp=0xc000501e20 sp=0xc000501e00 pc=0x7ff6fd02598e
runtime.goparkunlock(...)
        runtime/proc.go:441
runtime.semacquire1(0xc0001302a0, 0x0, 0x1, 0x0, 0x18)
        runtime/sema.go:188 +0x22f fp=0xc000501e88 sp=0xc000501e20 pc=0x7ff6fd00750f
sync.runtime_SemacquireWaitGroup(0x0?)
        runtime/sema.go:110 +0x25 fp=0xc000501ec0 sp=0xc000501e88 pc=0x7ff6fd026f85
sync.(*WaitGroup).Wait(0x0?)
        sync/waitgroup.go:118 +0x48 fp=0xc000501ee8 sp=0xc000501ec0 pc=0x7ff6fd03b988
github.com/ollama/ollama/runner/llamarunner.(*Server).run(0xc000130280, {0x7ff6fe79fed0, 0xc0001340f0})
        github.com/ollama/ollama/runner/llamarunner/runner.go:360 +0x4b fp=0xc000501fb8 sp=0xc000501ee8 pc=0x7ff6fd58bdab
github.com/ollama/ollama/runner/llamarunner.Execute.gowrap1()
        github.com/ollama/ollama/runner/llamarunner/runner.go:981 +0x28 fp=0xc000501fe0 sp=0xc000501fb8 pc=0x7ff6fd591308
runtime.goexit({})
        runtime/asm_amd64.s:1700 +0x1 fp=0xc000501fe8 sp=0xc000501fe0 pc=0x7ff6fd02d9a1
created by github.com/ollama/ollama/runner/llamarunner.Execute in goroutine 1
        github.com/ollama/ollama/runner/llamarunner/runner.go:981 +0x4c5
rax     0x0
rbx     0x7ffd0d923c60
rcx     0x7ffd0d907da0
rdx     0x7ffd0a8ca55c
rdi     0x19930520
rsi     0xe24636efb8
rbp     0xe24636ef70
rsp     0xe24636a0f0
r8      0x0
r9      0x0
r10     0xfffffffffffffffe
r11     0x0
r12     0xe24636a310
r13     0x0
r14     0xe24636b198
r15     0x0
rip     0x7ffe1d79055c
rflags  0x202
cs      0x33
fs      0x53
gs      0x2b

OS

Windows

GPU

Intel

CPU

Intel

Ollama version

0.17.7

extent analysis

Fix Plan

The issue seems to be related to the GPU memory allocation. To fix this, we can try the following steps:

Reduce model size: Try reducing the model size to see if it can fit into the GPU memory.
Increase GPU memory: If possible, try increasing the GPU memory to accommodate the model.
Use model pruning: Use model pruning techniques to reduce the model size while maintaining its performance.
Use a different GPU: If the issue persists, try using a different GPU with more memory.

Here are some code changes that can be made to reduce the model size:

# Reduce the number of layers
num_layers = 20  # instead of 40

# Reduce the embedding dimension
embedding_dim = 512  # instead of 1536

# Reduce the feed forward dimension
feed_forward_dim = 256  # instead of 512

Verification

To verify that the fix worked, we can check the GPU memory usage after making the changes. We can use tools like nvidia-smi to monitor the GPU memory usage.

nvidia-smi

We can also check the model performance after making the changes to ensure that it is still working as expected.

Extra Tips

Make sure to monitor the GPU memory usage regularly to avoid running out of memory.
Consider using a more efficient model architecture that requires less memory.
If the issue persists, try reducing the batch size or sequence length to reduce the memory requirements.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #integration issue #index setup #retrieval issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix Error with intel GPU and unable to use nvidia+intel GPUs at the same time [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix Error with intel GPU and unable to use nvidia+intel GPUs at the same time [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING