ollama - 💡(How to fix) Fix Ollama 0.20.2 unknown model architecture: 'gemma4' with Cuda arch 50 [26 comments, 5 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15354Fetched 2026-04-08 02:52:17
View on GitHub
Comments
26
Participants
5
Timeline
32
Reactions
3
Author
Timeline (top)
commented ×26subscribed ×3labeled ×1mentioned ×1

Error Message

Error: 500 Internal Server Error: unable to load model: /var/lib/ollama/models/blobs/sha256-7121486771cbfe218851513210c40b35dbdee93ab1ef43fe36283c883980f0df

Code Example

ollama run gemma4:26b

---

Error: 500 Internal Server Error: unable to load model: /var/lib/ollama/models/blobs/sha256-7121486771cbfe218851513210c40b35dbdee93ab1ef43fe36283c883980f0df

---

{ pkgs, ... }:

{
  hardware = {
    enableRedistributableFirmware = true;
    enableAllFirmware = true;
    graphics.enable = true;
  };

  # https://wiki.nixos.org/wiki/CUDA#Setting_up_CUDA_Binary_Cache
  nix.settings = {
    download-buffer-size = 524288000;
    substituters = [
      "https://cache.nixos-cuda.org"
    ];
    trusted-public-keys = [
      "cache.nixos-cuda.org:74DUi4Ye579gUqzH4ziL9IyiJBlDpMRn9MBN8oNan9M="
    ];
  };

  services.xserver.videoDrivers = [ "nvidia" ];

  services.ollama = {
    enable = true;

    package = (pkgs.ollama-cuda.override {
      cudaArches = [ "50" ];
    }).overrideAttrs (finalAttrs: previousAttrs: rec {
      version = "0.20.2";
      src = previousAttrs.src.override {
        tag = "v${version}";
      };
    });

    loadModels = [
      "gemma4:26b"
      "gemma4:e4b"
    ];
  };
}

---

systemctl status ollama.service

---

● ollama.service - Server for local large language models
     Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: ignored)
     Active: active (running) since Sun 2026-04-05 17:35:15 PDT; 31min ago
 Invocation: a579fdd633f94e5f96fd2fc6412f6c0e
   Main PID: 66802 (.ollama-wrapped)
         IP: 16.9G in, 143M out
         IO: 16.9G read, 33.5G written
      Tasks: 25 (limit: 38310)
     Memory: 17.4G (peak: 17.5G)
        CPU: 7min 13.052s
     CGroup: /system.slice/ollama.service
             └─66802 /nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/bin/ollama serve

Apr 05 18:06:13 nixos ollama[66802]: llama_model_loader: - type q8_0:   28 tensors
Apr 05 18:06:13 nixos ollama[66802]: llama_model_loader: - type q4_K:  193 tensors
Apr 05 18:06:13 nixos ollama[66802]: llama_model_loader: - type q6_K:   13 tensors
Apr 05 18:06:13 nixos ollama[66802]: print_info: file format = GGUF V3 (latest)
Apr 05 18:06:13 nixos ollama[66802]: print_info: file type   = Q4_K - Medium
Apr 05 18:06:13 nixos ollama[66802]: print_info: file size   = 16.74 GiB (5.57 BPW)
Apr 05 18:06:13 nixos ollama[66802]: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4'
Apr 05 18:06:13 nixos ollama[66802]: llama_model_load_from_file_impl: failed to load model
Apr 05 18:06:13 nixos ollama[66802]: time=2026-04-05T18:06:13.089-07:00 level=INFO source=sched.go:471 msg="NewLlamaServer failed" model=/var/lib/ollama/models/blobs/sha256-7121486771cbfe218851513210c40b35dbdee93ab1ef43fe36283c883980f0df error="unable to load model: /var/lib/ollama/models/blobs/sha256-7121486771cbfe218851513210c40b35dbdee93ab1ef43fe36283c883980f0df"
Apr 05 18:06:13 nixos ollama[66802]: [GIN] 2026/04/05 - 18:06:13 | 500 |   1.22498756s |       127.0.0.1 | POST     "/api/generate"
RAW_BUFFERClick to expand / collapse

What is the issue?

ollama run gemma4:26b
Error: 500 Internal Server Error: unable to load model: /var/lib/ollama/models/blobs/sha256-7121486771cbfe218851513210c40b35dbdee93ab1ef43fe36283c883980f0df

Note; same sorta error is reported for gemma4:e4b too

Models such qwen3.5:9b and gemma3n:e4b have no issues on this device and sometimes, when weather is just right, will use GPU too


Partial NixOS config that may aid in reproducing issue;

{ pkgs, ... }:

{
  hardware = {
    enableRedistributableFirmware = true;
    enableAllFirmware = true;
    graphics.enable = true;
  };

  # https://wiki.nixos.org/wiki/CUDA#Setting_up_CUDA_Binary_Cache
  nix.settings = {
    download-buffer-size = 524288000;
    substituters = [
      "https://cache.nixos-cuda.org"
    ];
    trusted-public-keys = [
      "cache.nixos-cuda.org:74DUi4Ye579gUqzH4ziL9IyiJBlDpMRn9MBN8oNan9M="
    ];
  };

  services.xserver.videoDrivers = [ "nvidia" ];

  services.ollama = {
    enable = true;

    package = (pkgs.ollama-cuda.override {
      cudaArches = [ "50" ];
    }).overrideAttrs (finalAttrs: previousAttrs: rec {
      version = "0.20.2";
      src = previousAttrs.src.override {
        tag = "v${version}";
      };
    });

    loadModels = [
      "gemma4:26b"
      "gemma4:e4b"
    ];
  };
}

Relevant log output

systemctl status ollama.service
● ollama.service - Server for local large language models
     Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: ignored)
     Active: active (running) since Sun 2026-04-05 17:35:15 PDT; 31min ago
 Invocation: a579fdd633f94e5f96fd2fc6412f6c0e
   Main PID: 66802 (.ollama-wrapped)
         IP: 16.9G in, 143M out
         IO: 16.9G read, 33.5G written
      Tasks: 25 (limit: 38310)
     Memory: 17.4G (peak: 17.5G)
        CPU: 7min 13.052s
     CGroup: /system.slice/ollama.service
             └─66802 /nix/store/5r6bvxj0jgdnmsgdi8lysdpkxpcchknv-ollama-0.20.2/bin/ollama serve

Apr 05 18:06:13 nixos ollama[66802]: llama_model_loader: - type q8_0:   28 tensors
Apr 05 18:06:13 nixos ollama[66802]: llama_model_loader: - type q4_K:  193 tensors
Apr 05 18:06:13 nixos ollama[66802]: llama_model_loader: - type q6_K:   13 tensors
Apr 05 18:06:13 nixos ollama[66802]: print_info: file format = GGUF V3 (latest)
Apr 05 18:06:13 nixos ollama[66802]: print_info: file type   = Q4_K - Medium
Apr 05 18:06:13 nixos ollama[66802]: print_info: file size   = 16.74 GiB (5.57 BPW)
Apr 05 18:06:13 nixos ollama[66802]: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'gemma4'
Apr 05 18:06:13 nixos ollama[66802]: llama_model_load_from_file_impl: failed to load model
Apr 05 18:06:13 nixos ollama[66802]: time=2026-04-05T18:06:13.089-07:00 level=INFO source=sched.go:471 msg="NewLlamaServer failed" model=/var/lib/ollama/models/blobs/sha256-7121486771cbfe218851513210c40b35dbdee93ab1ef43fe36283c883980f0df error="unable to load model: /var/lib/ollama/models/blobs/sha256-7121486771cbfe218851513210c40b35dbdee93ab1ef43fe36283c883980f0df"
Apr 05 18:06:13 nixos ollama[66802]: [GIN] 2026/04/05 - 18:06:13 | 500 |   1.22498756s |       127.0.0.1 | POST     "/api/generate"

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.20.2


Notes and updates

I did search about for related issues, hence why for manually updating to version provided via override, but no joy was had

I also tried a flake update for lock file, but that resulted in previously functional models that previously used GPU being restricted to CPU, so that was less than joyful too

extent analysis

TL;DR

The issue is likely due to an incompatible model architecture, and a potential fix is to update the ollama package to a version that supports the gemma4 model architecture.

Guidance

  • Verify that the gemma4 model architecture is supported by the current version of ollama (0.20.2) by checking the documentation or release notes.
  • Check the model file format and type to ensure it matches the expected format for the gemma4 model architecture.
  • Consider updating the ollama package to a newer version that may include support for the gemma4 model architecture, but be cautious of potential regressions like the one experienced with the flake update.
  • Review the nix configuration to ensure that the cudaArches and version overrides are correct and compatible with the gemma4 model architecture.

Example

No code snippet is provided as the issue is related to configuration and model compatibility rather than code.

Notes

The issue may be specific to the gemma4 model architecture, and updating the ollama package or changing the model configuration may resolve the issue. However, caution is advised when updating the package or configuration to avoid introducing new issues or regressions.

Recommendation

Apply a workaround by checking the model compatibility and updating the ollama package to a version that supports the gemma4 model architecture, if available. This is recommended because the current version (0.20.2) seems to have issues with the gemma4 model architecture, and updating to a newer version may resolve the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING