ollama - 💡(How to fix) Fix llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4' llama_model_load_from_file_impl: failed to load model [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15545Fetched 2026-04-15 06:20:18
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
closed ×1labeled ×1

Error Message

time=2026-04-13T23:24:34.184+08:00 level=WARN source=types.go:977 msg="invalid option provided" option=penalize_newline llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4' time=2026-04-13T23:24:34.427+08:00 level=INFO source=sched.go:462 msg="failed to create server" model=modelscope.cn/unsloth/gemma-4-26B-A4B-it-GGUF:latest error="unable to load model: /Users/ander/.ollama/models/blobs/sha256-b8707e57f676d8dd1b80f623b45200cc92e6966b0e95275e606f412095a49fde" time=2026-04-13T23:24:36.541+08:00 level=WARN source=types.go:977 msg="invalid option provided" option=penalize_newline

RAW_BUFFERClick to expand / collapse

What is the issue?

➜ ~ ollama --version Warning: could not connect to a running Ollama instance Warning: client version is 0.20.6 ➜ ~ ollama start time=2026-04-13T23:24:23.633+08:00 level=INFO source=routes.go:1752 msg="server config" env="map[HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/Users/ander/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false http_proxy: https_proxy: no_proxy:]" time=2026-04-13T23:24:23.634+08:00 level=INFO source=routes.go:1754 msg="Ollama cloud disabled: false" time=2026-04-13T23:24:23.634+08:00 level=INFO source=images.go:499 msg="total blobs: 6" time=2026-04-13T23:24:23.634+08:00 level=INFO source=images.go:506 msg="total unused blobs removed: 0" time=2026-04-13T23:24:23.635+08:00 level=INFO source=routes.go:1810 msg="Listening on 127.0.0.1:11434 (version 0.20.6)" time=2026-04-13T23:24:23.635+08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-04-13T23:24:23.636+08:00 level=INFO source=server.go:444 msg="starting runner" cmd="/Applications/Ollama.app/Contents/Resources/ollama runner --ollama-engine --port 51319" time=2026-04-13T23:24:23.706+08:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=Metal compute=0.0 name=Metal description="Apple M4" libdirs="" driver=0.0 pci_id="" type=discrete total="17.8 GiB" available="17.8 GiB" time=2026-04-13T23:24:23.706+08:00 level=INFO source=routes.go:1860 msg="vram-based default context" total_vram="17.8 GiB" default_num_ctx=4096 time=2026-04-13T23:24:34.184+08:00 level=WARN source=types.go:977 msg="invalid option provided" option=penalize_newline ggml_metal_device_init: tensor API disabled for pre-M5 and pre-A19 devices ggml_metal_library_init: using embedded metal library ggml_metal_library_init: loaded in 0.006 sec ggml_metal_rsets_init: creating a residency set collection (keep_alive = 180 s) ggml_metal_device_init: GPU name: Apple M4 ggml_metal_device_init: GPU family: MTLGPUFamilyApple9 (1009) ggml_metal_device_init: GPU family: MTLGPUFamilyCommon3 (3003) ggml_metal_device_init: GPU family: MTLGPUFamilyMetal4 (5002) ggml_metal_device_init: simdgroup reduction = true ggml_metal_device_init: simdgroup matrix mul. = true ggml_metal_device_init: has unified memory = true ggml_metal_device_init: has bfloat = true ggml_metal_device_init: has tensor = false ggml_metal_device_init: use residency sets = true ggml_metal_device_init: use shared buffers = true ggml_metal_device_init: recommendedMaxWorkingSetSize = 19069.67 MB llama_model_load_from_file_impl: using device Metal (Apple M4) (unknown id) - 18185 MiB free llama_model_loader: loaded meta data with 60 key-value pairs and 658 tensors from /Users/ander/.ollama/models/blobs/sha256-b8707e57f676d8dd1b80f623b45200cc92e6966b0e95275e606f412095a49fde (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = gemma4 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.sampling.top_k i32 = 64 llama_model_loader: - kv 3: general.sampling.top_p f32 = 0.950000 llama_model_loader: - kv 4: general.sampling.temp f32 = 1.000000 llama_model_loader: - kv 5: general.name str = Gemma-4-26B-A4B-It llama_model_loader: - kv 6: general.finetune str = it llama_model_loader: - kv 7: general.basename str = Gemma-4-26B-A4B-It llama_model_loader: - kv 8: general.quantized_by str = Unsloth llama_model_loader: - kv 9: general.size_label str = 26B-A4B llama_model_loader: - kv 10: general.license str = apache-2.0 llama_model_loader: - kv 11: general.license.link str = https://ai.google.dev/gemma/docs/gemm... llama_model_loader: - kv 12: general.repo_url str = https://huggingface.co/unsloth llama_model_loader: - kv 13: general.base_model.count u32 = 1 llama_model_loader: - kv 14: general.base_model.0.name str = Gemma 4 26B A4B It llama_model_loader: - kv 15: general.base_model.0.organization str = Google llama_model_loader: - kv 16: general.base_model.0.repo_url str = https://huggingface.co/google/gemma-4... llama_model_loader: - kv 17: general.tags arr[str,2] = ["unsloth", "image-text-to-text"] llama_model_loader: - kv 18: gemma4.block_count u32 = 30 llama_model_loader: - kv 19: gemma4.context_length u32 = 262144 llama_model_loader: - kv 20: gemma4.embedding_length u32 = 2816 llama_model_loader: - kv 21: gemma4.feed_forward_length u32 = 2112 llama_model_loader: - kv 22: gemma4.attention.head_count u32 = 16 llama_model_loader: - kv 23: gemma4.attention.head_count_kv arr[i32,30] = [8, 8, 8, 8, 8, 2, 8, 8, 8, 8, 8, 2, ... llama_model_loader: - kv 24: gemma4.rope.freq_base f32 = 1000000.000000 llama_model_loader: - kv 25: gemma4.rope.freq_base_swa f32 = 10000.000000 llama_model_loader: - kv 26: gemma4.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 27: gemma4.expert_count u32 = 128 llama_model_loader: - kv 28: gemma4.expert_used_count u32 = 8 llama_model_loader: - kv 29: gemma4.attention.key_length u32 = 512 llama_model_loader: - kv 30: gemma4.attention.value_length u32 = 512 llama_model_loader: - kv 31: gemma4.final_logit_softcapping f32 = 30.000000 llama_model_loader: - kv 32: gemma4.attention.sliding_window u32 = 1024 llama_model_loader: - kv 33: gemma4.attention.shared_kv_layers u32 = 0 llama_model_loader: - kv 34: gemma4.embedding_length_per_layer_input u32 = 0 llama_model_loader: - kv 35: gemma4.attention.sliding_window_pattern arr[bool,30] = [true, true, true, true, true, false,... llama_model_loader: - kv 36: gemma4.attention.key_length_swa u32 = 256 llama_model_loader: - kv 37: gemma4.attention.value_length_swa u32 = 256 llama_model_loader: - kv 38: gemma4.expert_feed_forward_length u32 = 704 llama_model_loader: - kv 39: gemma4.rope.dimension_count u32 = 512 llama_model_loader: - kv 40: gemma4.rope.dimension_count_swa u32 = 256 llama_model_loader: - kv 41: tokenizer.ggml.model str = gemma4 llama_model_loader: - kv 42: tokenizer.ggml.tokens arr[str,262144] = ["<pad>", "<eos>", "<bos>", "<unk>", ... llama_model_loader: - kv 43: tokenizer.ggml.scores arr[f32,262144] = [-1000.000000, -1000.000000, -1000.00... llama_model_loader: - kv 44: tokenizer.ggml.token_type arr[i32,262144] = [3, 1, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 45: tokenizer.ggml.merges arr[str,514906] = ["\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n", ... llama_model_loader: - kv 46: tokenizer.ggml.bos_token_id u32 = 2 llama_model_loader: - kv 47: tokenizer.ggml.eos_token_id u32 = 106 llama_model_loader: - kv 48: tokenizer.ggml.unknown_token_id u32 = 3 llama_model_loader: - kv 49: tokenizer.ggml.padding_token_id u32 = 0 llama_model_loader: - kv 50: tokenizer.ggml.mask_token_id u32 = 4 llama_model_loader: - kv 51: tokenizer.chat_template str = {%- macro format_parameters(propertie... llama_model_loader: - kv 52: tokenizer.ggml.add_space_prefix bool = false llama_model_loader: - kv 53: tokenizer.ggml.add_bos_token bool = true llama_model_loader: - kv 54: general.quantization_version u32 = 2 llama_model_loader: - kv 55: general.file_type u32 = 15 llama_model_loader: - kv 56: quantize.imatrix.file str = gemma-4-26B-A4B-it-GGUF/imatrix_unslo... llama_model_loader: - kv 57: quantize.imatrix.dataset str = unsloth_calibration_gemma-4-26B-A4B-i... llama_model_loader: - kv 58: quantize.imatrix.entries_count u32 = 295 llama_model_loader: - kv 59: quantize.imatrix.chunks_count u32 = 141 llama_model_loader: - type f32: 392 tensors llama_model_loader: - type q5_1: 30 tensors llama_model_loader: - type q8_0: 206 tensors llama_model_loader: - type q4_K: 30 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q4_K - Medium print_info: file size = 15.70 GiB (5.34 BPW) llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4' llama_model_load_from_file_impl: failed to load model time=2026-04-13T23:24:34.427+08:00 level=INFO source=sched.go:462 msg="failed to create server" model=modelscope.cn/unsloth/gemma-4-26B-A4B-it-GGUF:latest error="unable to load model: /Users/ander/.ollama/models/blobs/sha256-b8707e57f676d8dd1b80f623b45200cc92e6966b0e95275e606f412095a49fde" [GIN] 2026/04/13 - 23:24:34 | 500 | 352.501291ms | 127.0.0.1 | POST "/api/chat" time=2026-04-13T23:24:36.541+08:00 level=WARN source=types.go:977 msg="invalid option provided" option=penalize_newline

Relevant log output

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.20.6

extent analysis

TL;DR

The issue is likely due to an unknown model architecture 'gemma4' which is causing the model to fail loading.

Guidance

  • The error message "unknown model architecture: 'gemma4'" suggests that the model architecture is not recognized by the Ollama version being used.
  • Check the Ollama version and the model version to ensure compatibility.
  • Verify that the model file is correct and not corrupted.
  • Consider updating Ollama to a version that supports the 'gemma4' model architecture, if available.

Example

No code snippet is provided as the issue seems to be related to model compatibility rather than code.

Notes

The issue may be specific to the 'gemma4' model architecture and Ollama version 0.20.6. Further investigation is needed to determine the root cause and a suitable solution.

Recommendation

Apply a workaround by checking model compatibility or updating Ollama to a version that supports the 'gemma4' model architecture, if available. The reason is that the current Ollama version does not recognize the 'gemma4' model architecture, causing the model to fail loading.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING