ollama - 💡(How to fix) Fix unknown model architecture: 'gemma4' [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15303Fetched 2026-04-08 02:44:24
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
1
Participants
Timeline (top)
closed ×1labeled ×1

Error Message

running ollama 0.20.0, gemma4 models from huggingface give an "unknown model architecture: 'gemma4'" error. llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4'

Code Example

llama_model_loader: loaded meta data with 56 key-value pairs and 601 tensors from /var/lib/ollama/blobs/sha256-f3504b387ee0962b2b041cf3691b1520118822642d67c5294f85ea62c68614b3 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma4
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                     general.sampling.top_k i32              = 64
llama_model_loader: - kv   3:                     general.sampling.top_p f32              = 0.950000
llama_model_loader: - kv   4:                      general.sampling.temp f32              = 1.000000
llama_model_loader: - kv   5:                               general.name str              = Gemma-4-E2B-It
llama_model_loader: - kv   6:                           general.basename str              = Gemma-4-E2B-It
llama_model_loader: - kv   7:                       general.quantized_by str              = Unsloth
llama_model_loader: - kv   8:                         general.size_label str              = 4.6B
llama_model_loader: - kv   9:                            general.license str              = apache-2.0
llama_model_loader: - kv  10:                       general.license.link str              = https://ai.google.dev/gemma/docs/gemm...
llama_model_loader: - kv  11:                           general.repo_url str              = https://huggingface.co/unsloth
llama_model_loader: - kv  12:                   general.base_model.count u32              = 1
llama_model_loader: - kv  13:                  general.base_model.0.name str              = Gemma 4 E2B It
llama_model_loader: - kv  14:          general.base_model.0.organization str              = Google
llama_model_loader: - kv  15:              general.base_model.0.repo_url str              = https://huggingface.co/google/gemma-4...
llama_model_loader: - kv  16:                               general.tags arr[str,2]       = ["unsloth", "any-to-any"]
llama_model_loader: - kv  17:                         gemma4.block_count u32              = 35
llama_model_loader: - kv  18:                      gemma4.context_length u32              = 131072
llama_model_loader: - kv  19:                    gemma4.embedding_length u32              = 1536
llama_model_loader: - kv  20:                 gemma4.feed_forward_length arr[i32,35]      = [6144, 6144, 6144, 6144, 6144, 6144, ...
llama_model_loader: - kv  21:                gemma4.attention.head_count u32              = 8
llama_model_loader: - kv  22:             gemma4.attention.head_count_kv u32              = 1
llama_model_loader: - kv  23:                      gemma4.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  24:                  gemma4.rope.freq_base_swa f32              = 10000.000000
llama_model_loader: - kv  25:    gemma4.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  26:                gemma4.attention.key_length u32              = 512
llama_model_loader: - kv  27:              gemma4.attention.value_length u32              = 512
llama_model_loader: - kv  28:             gemma4.final_logit_softcapping f32              = 30.000000
llama_model_loader: - kv  29:            gemma4.attention.sliding_window u32              = 512
llama_model_loader: - kv  30:          gemma4.attention.shared_kv_layers u32              = 20
llama_model_loader: - kv  31:    gemma4.embedding_length_per_layer_input u32              = 256
llama_model_loader: - kv  32:    gemma4.attention.sliding_window_pattern arr[bool,35]     = [true, true, true, true, false, true,...
llama_model_loader: - kv  33:            gemma4.attention.key_length_swa u32              = 256
llama_model_loader: - kv  34:          gemma4.attention.value_length_swa u32              = 256
llama_model_loader: - kv  35:                gemma4.rope.dimension_count u32              = 512
llama_model_loader: - kv  36:            gemma4.rope.dimension_count_swa u32              = 256
llama_model_loader: - kv  37:                       tokenizer.ggml.model str              = gemma4
llama_model_loader: - kv  38:                      tokenizer.ggml.tokens arr[str,262144]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv  39:                      tokenizer.ggml.scores arr[f32,262144]  = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  40:                  tokenizer.ggml.token_type arr[i32,262144]  = [3, 1, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  41:                      tokenizer.ggml.merges arr[str,514906]  = ["\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n", ...
llama_model_loader: - kv  42:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  43:                tokenizer.ggml.eos_token_id u32              = 106
llama_model_loader: - kv  44:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  45:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  46:               tokenizer.ggml.mask_token_id u32              = 4
llama_model_loader: - kv  47:                    tokenizer.chat_template str              = {%- macro format_parameters(propertie...
llama_model_loader: - kv  48:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  49:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  50:               general.quantization_version u32              = 2
llama_model_loader: - kv  51:                          general.file_type u32              = 15
llama_model_loader: - kv  52:                      quantize.imatrix.file str              = gemma-4-E2B-it-GGUF/imatrix_unsloth.gguf
llama_model_loader: - kv  53:                   quantize.imatrix.dataset str              = unsloth_calibration_gemma-4-E2B-it.txt
llama_model_loader: - kv  54:             quantize.imatrix.entries_count u32              = 275
llama_model_loader: - kv  55:              quantize.imatrix.chunks_count u32              = 141
llama_model_loader: - type  f32:  353 tensors
llama_model_loader: - type q4_K:  212 tensors
llama_model_loader: - type q5_K:    1 tensors
llama_model_loader: - type q6_K:   34 tensors
llama_model_loader: - type bf16:    1 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 2.88 GiB (5.32 BPW)
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4'
llama_model_load_from_file_impl: failed to load model
RAW_BUFFERClick to expand / collapse

What is the issue?

running ollama 0.20.0, gemma4 models from huggingface give an "unknown model architecture: 'gemma4'" error.

Relevant log output

llama_model_loader: loaded meta data with 56 key-value pairs and 601 tensors from /var/lib/ollama/blobs/sha256-f3504b387ee0962b2b041cf3691b1520118822642d67c5294f85ea62c68614b3 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma4
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                     general.sampling.top_k i32              = 64
llama_model_loader: - kv   3:                     general.sampling.top_p f32              = 0.950000
llama_model_loader: - kv   4:                      general.sampling.temp f32              = 1.000000
llama_model_loader: - kv   5:                               general.name str              = Gemma-4-E2B-It
llama_model_loader: - kv   6:                           general.basename str              = Gemma-4-E2B-It
llama_model_loader: - kv   7:                       general.quantized_by str              = Unsloth
llama_model_loader: - kv   8:                         general.size_label str              = 4.6B
llama_model_loader: - kv   9:                            general.license str              = apache-2.0
llama_model_loader: - kv  10:                       general.license.link str              = https://ai.google.dev/gemma/docs/gemm...
llama_model_loader: - kv  11:                           general.repo_url str              = https://huggingface.co/unsloth
llama_model_loader: - kv  12:                   general.base_model.count u32              = 1
llama_model_loader: - kv  13:                  general.base_model.0.name str              = Gemma 4 E2B It
llama_model_loader: - kv  14:          general.base_model.0.organization str              = Google
llama_model_loader: - kv  15:              general.base_model.0.repo_url str              = https://huggingface.co/google/gemma-4...
llama_model_loader: - kv  16:                               general.tags arr[str,2]       = ["unsloth", "any-to-any"]
llama_model_loader: - kv  17:                         gemma4.block_count u32              = 35
llama_model_loader: - kv  18:                      gemma4.context_length u32              = 131072
llama_model_loader: - kv  19:                    gemma4.embedding_length u32              = 1536
llama_model_loader: - kv  20:                 gemma4.feed_forward_length arr[i32,35]      = [6144, 6144, 6144, 6144, 6144, 6144, ...
llama_model_loader: - kv  21:                gemma4.attention.head_count u32              = 8
llama_model_loader: - kv  22:             gemma4.attention.head_count_kv u32              = 1
llama_model_loader: - kv  23:                      gemma4.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  24:                  gemma4.rope.freq_base_swa f32              = 10000.000000
llama_model_loader: - kv  25:    gemma4.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  26:                gemma4.attention.key_length u32              = 512
llama_model_loader: - kv  27:              gemma4.attention.value_length u32              = 512
llama_model_loader: - kv  28:             gemma4.final_logit_softcapping f32              = 30.000000
llama_model_loader: - kv  29:            gemma4.attention.sliding_window u32              = 512
llama_model_loader: - kv  30:          gemma4.attention.shared_kv_layers u32              = 20
llama_model_loader: - kv  31:    gemma4.embedding_length_per_layer_input u32              = 256
llama_model_loader: - kv  32:    gemma4.attention.sliding_window_pattern arr[bool,35]     = [true, true, true, true, false, true,...
llama_model_loader: - kv  33:            gemma4.attention.key_length_swa u32              = 256
llama_model_loader: - kv  34:          gemma4.attention.value_length_swa u32              = 256
llama_model_loader: - kv  35:                gemma4.rope.dimension_count u32              = 512
llama_model_loader: - kv  36:            gemma4.rope.dimension_count_swa u32              = 256
llama_model_loader: - kv  37:                       tokenizer.ggml.model str              = gemma4
llama_model_loader: - kv  38:                      tokenizer.ggml.tokens arr[str,262144]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
llama_model_loader: - kv  39:                      tokenizer.ggml.scores arr[f32,262144]  = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv  40:                  tokenizer.ggml.token_type arr[i32,262144]  = [3, 1, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  41:                      tokenizer.ggml.merges arr[str,514906]  = ["\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n \n", ...
llama_model_loader: - kv  42:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  43:                tokenizer.ggml.eos_token_id u32              = 106
llama_model_loader: - kv  44:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  45:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  46:               tokenizer.ggml.mask_token_id u32              = 4
llama_model_loader: - kv  47:                    tokenizer.chat_template str              = {%- macro format_parameters(propertie...
llama_model_loader: - kv  48:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  49:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  50:               general.quantization_version u32              = 2
llama_model_loader: - kv  51:                          general.file_type u32              = 15
llama_model_loader: - kv  52:                      quantize.imatrix.file str              = gemma-4-E2B-it-GGUF/imatrix_unsloth.gguf
llama_model_loader: - kv  53:                   quantize.imatrix.dataset str              = unsloth_calibration_gemma-4-E2B-it.txt
llama_model_loader: - kv  54:             quantize.imatrix.entries_count u32              = 275
llama_model_loader: - kv  55:              quantize.imatrix.chunks_count u32              = 141
llama_model_loader: - type  f32:  353 tensors
llama_model_loader: - type q4_K:  212 tensors
llama_model_loader: - type q5_K:    1 tensors
llama_model_loader: - type q6_K:   34 tensors
llama_model_loader: - type bf16:    1 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 2.88 GiB (5.32 BPW)
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4'
llama_model_load_from_file_impl: failed to load model

OS

Linux

GPU

AMD

CPU

Intel

Ollama version

0.20.0

extent analysis

TL;DR

The issue is likely due to ollama 0.20.0 not supporting the 'gemma4' model architecture, and a workaround or update may be necessary.

Guidance

  • Check the ollama documentation or release notes to see if 'gemma4' is a supported model architecture in version 0.20.0.
  • Verify that the model file is correctly formatted and compatible with ollama 0.20.0.
  • Consider updating ollama to a newer version that may support the 'gemma4' model architecture, if available.
  • If an update is not available, try using a different model architecture that is supported by ollama 0.20.0.

Example

No code example is provided as the issue seems to be related to model architecture support rather than code.

Notes

The issue may be specific to the 'gemma4' model architecture and ollama version 0.20.0. Further investigation into ollama's documentation and release notes may be necessary to resolve the issue.

Recommendation

Apply a workaround, such as using a different model architecture, until an update to ollama that supports the 'gemma4' model architecture is available.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING