ollama - 💡(How to fix) Fix 0.18.x idle VRAM usage and power consumption [1 comments, 1 participants]

Error Message

I don't know whether it is relevant, but the following error only exists in 0.18.x log (about 3 seconds after server start, reproducible). In 0.17.7, on such error. Error #01: write tcp 127.0.0.1:11434->127.0.0.1:54305: wsasend: An established connection was aborted by the software in your host machine.

Code Example

Wed Mar 25 10:32:19 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 591.59                 Driver Version: 591.59         CUDA Version: 13.1     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A40                   TCC   |   00000000:01:00.0 Off |                  Off |
|  0%   58C    P0             85W /  300W |     272MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A           15276      C   ...al\Programs\Ollama\ollama.exe        262MiB |
+-----------------------------------------------------------------------------------------+

---

Wed Mar 25 11:02:20 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 591.59                 Driver Version: 591.59         CUDA Version: 13.1     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A40                   TCC   |   00000000:01:00.0 Off |                  Off |
|  0%   45C    P8             14W /  300W |      10MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

---

I don't know whether it is relevant, but the following error only exists in 0.18.x log (about 3 seconds after server start, reproducible). In 0.17.7, on such error.

Error #01: write tcp 127.0.0.1:11434->127.0.0.1:54305: wsasend: An established connection was aborted by the software in your host machine.

What is the issue?

I was using Ollama 0.17.7 under Windows 11 and everything is fine. However, after I updated to 0.18.2, my fans become noisy even if idle. The output of nvidia-smi shows that a ollama process is using 262MB VRAM, even if ollama is idle (Not running any models, only system tray icon).

Wed Mar 25 10:32:19 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 591.59                 Driver Version: 591.59         CUDA Version: 13.1     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A40                   TCC   |   00000000:01:00.0 Off |                  Off |
|  0%   58C    P0             85W /  300W |     272MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A           15276      C   ...al\Programs\Ollama\ollama.exe        262MiB |
+-----------------------------------------------------------------------------------------+

At the same time, ollama ps says no model is running.

Downgrading to 0.18.0, the same problem.

Downgrading to 0.17.7, everything is OK again. The output of nvidia-smi is normal.

Wed Mar 25 11:02:20 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 591.59                 Driver Version: 591.59         CUDA Version: 13.1     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A40                   TCC   |   00000000:01:00.0 Off |                  Off |
|  0%   45C    P8             14W /  300W |      10MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

So why are the VRAM used in 0.18.x when idle? Is this a new feature (if yes, can I manually turn it off?) or just a bug? I can't accept 70 Watts additional idle power!

Relevant log output

I don't know whether it is relevant, but the following error only exists in 0.18.x log (about 3 seconds after server start, reproducible). In 0.17.7, on such error.

Error #01: write tcp 127.0.0.1:11434->127.0.0.1:54305: wsasend: An established connection was aborted by the software in your host machine.

OS

Windows

GPU

Nvidia

CPU

AMD

Ollama version

0.18.2

extent analysis

Fix Plan

To address the issue of high VRAM usage by Ollama 0.18.x when idle, we'll explore a potential fix by modifying the configuration to reduce or eliminate unnecessary GPU resource allocation.

Check for Configuration Options: Review the Ollama documentation and configuration files for any settings related to GPU resource management or idle behavior.
Disable GPU Acceleration: If possible, try disabling GPU acceleration for Ollama when it's not in use. This might be achievable through a configuration setting or a command-line flag.
Implement a Workaround: Create a simple script or batch file that periodically checks if Ollama is idle and, if so, attempts to release any unnecessary GPU resources. This could involve using NVIDIA's API or a similar mechanism to manage GPU memory allocation.

Example code snippet (batch file) to release GPU resources:

@echo off
set "ollama_pid="

:: Find the Ollama process ID
for /f "tokens=2" %%a in ('tasklist ^| findstr ollama.exe') do set "ollama_pid=%%a"

:: Check if Ollama is idle (no models running)
if "%ollama_pid%" neq "" (
    :: Attempt to release GPU resources (example using NVIDIA's API)
    nvidia-smi --query-gpu=memory.free --format=csv,noheader | findstr /r /c:"[0-9]"
    :: If the above command returns a non-zero value, it may indicate that GPU resources are still in use
    :: Add additional logic here to release resources or restart Ollama
)

Note: The above script is a basic example and may require modifications to work correctly in your environment.

Verification

To verify that the fix worked:

Run the modified configuration or script.
Monitor the nvidia-smi output to check if VRAM usage decreases when Ollama is idle.
Verify that the fans return to a normal noise level.

Extra Tips

Regularly review the Ollama documentation and release notes for updates on GPU resource management and idle behavior.
Consider reporting the issue to the Ollama development team to request a permanent fix or additional configuration options.
If you're experiencing similar issues with other GPU-intensive applications, investigate whether they have similar configuration options or workarounds to manage GPU resource allocation.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix 0.18.x idle VRAM usage and power consumption [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix 0.18.x idle VRAM usage and power consumption [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING