ollama - 💡(How to fix) Fix granite4.1 models ignoring Ollama default context window size on Ollama 0.22.0 [1 participants]

Root Cause

I was running granite4.1:30b (17 GB) and noticed it was running slow on my hardware, given the GPUs I have. When I ran ollama ps, I saw that the model was using 97 GB . I only have roughly 48 GB of VRAM, so the model spilled over to the CPU. This is where the slowness came from. I believe this is because the context size is set to 131072 and not the ollama's default context size (8k tokens?). I tested this not only with a simple python app using the ollama library, but also by making sure no models are loaded and then run ollama run granite4.1:30b.

Code Example

$ nvidia-smi
Thu Apr 30 11:23:46 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:41:00.0 Off |                  Off |
|  0%   43C    P8             20W /  450W |      18MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        Off |   00000000:83:00.0  On |                  Off |
|  0%   48C    P8             18W /  450W |     104MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      3741      G   /usr/bin/gnome-shell                            6MiB |
|    1   N/A  N/A      3741      G   /usr/bin/gnome-shell                           66MiB |
|    1   N/A  N/A      3829      G   /usr/bin/Xwayland                               8MiB |
+-----------------------------------------------------------------------------------------+

$ ollama ps
NAME              ID              SIZE     PROCESSOR          CONTEXT    UNTIL
granite4.1:30b    3f3e5df8a021    97 GB    52%/48% CPU/GPU    131072     4 minutes from now


$ ollama ps
NAME             ID              SIZE     PROCESSOR          CONTEXT    UNTIL
granite4.1:8b    444af1c4b2fe    55 GB    15%/85% CPU/GPU    131072     4 minutes from now

What is the issue?

I tested the smaller granite4.1:8b model as well and this 5.3 GB was 55 GB. This is also because it is using the whole allowable context window and not the default size.

It would be nice if the granite4.1 models would use the default context size (if not specified) so that these models can fit on my GPUs.

Relevant log output

$ nvidia-smi
Thu Apr 30 11:23:46 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:41:00.0 Off |                  Off |
|  0%   43C    P8             20W /  450W |      18MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        Off |   00000000:83:00.0  On |                  Off |
|  0%   48C    P8             18W /  450W |     104MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      3741      G   /usr/bin/gnome-shell                            6MiB |
|    1   N/A  N/A      3741      G   /usr/bin/gnome-shell                           66MiB |
|    1   N/A  N/A      3829      G   /usr/bin/Xwayland                               8MiB |
+-----------------------------------------------------------------------------------------+

$ ollama ps
NAME              ID              SIZE     PROCESSOR          CONTEXT    UNTIL
granite4.1:30b    3f3e5df8a021    97 GB    52%/48% CPU/GPU    131072     4 minutes from now


$ ollama ps
NAME             ID              SIZE     PROCESSOR          CONTEXT    UNTIL
granite4.1:8b    444af1c4b2fe    55 GB    15%/85% CPU/GPU    131072     4 minutes from now

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.22.0

extent analysis

TL;DR

The issue can be resolved by setting the context size to the default value of 8k tokens when running the granite4.1 models.

Guidance

Verify the current context size used by the models by running ollama ps and checking the CONTEXT column.
Try setting the context size to the default value of 8k tokens when running the models using the ollama run command with the appropriate option.
Monitor the memory usage of the models after setting the context size to ensure it fits within the available VRAM.
Test the smaller granite4.1:8b model with the default context size to confirm the issue is resolved.

Notes

The provided log output and issue description suggest that the context size is the primary cause of the issue, but further testing may be necessary to confirm this.

Recommendation

Apply workaround: set the context size to the default value of 8k tokens when running the granite4.1 models, as this is likely to resolve the issue and allow the models to fit within the available VRAM.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix granite4.1 models ignoring Ollama default context window size on Ollama 0.22.0 [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix granite4.1 models ignoring Ollama default context window size on Ollama 0.22.0 [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING