ollama - 💡(How to fix) Fix Why does running deepseek-r1:32b require over 300 GB of memory? [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15049Fetched 2026-04-08 01:26:35
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Author
Timeline (top)
closed ×1commented ×1labeled ×1

Error Message

{"error":"model requires more system memory (338.1 GiB) than is available (328.0 GiB)"} 3月 25 09:28:05 sunway-SYS-420GP-TNR ollama[2816880]: time=2026-03-25T09:28:05.821+08:00 level=WARN source=server.go:1044 msg="model request too large for system" requested="338.1 GiB" available="337.2>

RAW_BUFFERClick to expand / collapse

What is the issue?

curl http://localhost:11434/api/generate -d '{ "model": "deepseek-r1:32b", "prompt": "Hello, how are you?", "stream": false }' {"error":"model requires more system memory (338.1 GiB) than is available (328.0 GiB)"}

Relevant log output

● ollama.service - Ollama Service Loaded: loaded (/etc/systemd/system/ollama.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2026-03-24 17:47:32 CST; 15h ago Main PID: 2816880 (ollama) Tasks: 24 (limit: 618634) Memory: 1.1G CGroup: /system.slice/ollama.service └─2816880 /usr/local/bin/ollama serve

3月 25 09:28:05 sunway-SYS-420GP-TNR ollama[2816880]: time=2026-03-25T09:28:05.821+08:00 level=DEBUG source=server.go:976 msg="available gpu" id=GPU-fdfb4273-9c05-65a3-6882-21542da92ff6 library=CUDA "a> 3月 25 09:28:05 sunway-SYS-420GP-TNR ollama[2816880]: time=2026-03-25T09:28:05.821+08:00 level=DEBUG source=server.go:976 msg="available gpu" id=GPU-e42673e2-1cc5-b890-6193-de099240bffe library=CUDA "a> 3月 25 09:28:05 sunway-SYS-420GP-TNR ollama[2816880]: time=2026-03-25T09:28:05.821+08:00 level=WARN source=server.go:1044 msg="model request too large for system" requested="338.1 GiB" available="337.2> 3月 25 09:28:05 sunway-SYS-420GP-TNR ollama[2816880]: time=2026-03-25T09:28:05.821+08:00 level=INFO source=sched.go:516 msg="Load failed" model=/usr/share/ollama/.ollama/models/blobs/sha256-6150cb38231> 3月 25 09:28:05 sunway-SYS-420GP-TNR ollama[2816880]: time=2026-03-25T09:28:05.853+08:00 level=INFO source=runner.go:965 msg="starting go runner" 3月 25 09:28:05 sunway-SYS-420GP-TNR ollama[2816880]: time=2026-03-25T09:28:05.853+08:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/local/lib/ollama 3月 25 09:28:05 sunway-SYS-420GP-TNR ollama[2816880]: time=2026-03-25T09:28:05.873+08:00 level=DEBUG source=server.go:1830 msg="stopping llama server" pid=325701 3月 25 09:28:05 sunway-SYS-420GP-TNR ollama[2816880]: time=2026-03-25T09:28:05.873+08:00 level=DEBUG source=server.go:1836 msg="waiting for llama server to exit" pid=325701 3月 25 09:28:05 sunway-SYS-420GP-TNR ollama[2816880]: time=2026-03-25T09:28:05.882+08:00 level=DEBUG source=server.go:1840 msg="llama server stopped" pid=325701 3月 25 09:28:05 sunway-SYS-420GP-TNR ollama[2816880]: [GIN] 2026/03/25 - 09:28:05 | 500 | 1.744758145s | 127.0.0.1 | POST "/api/generate"

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.18.0

extent analysis

Fix Plan

The fix involves increasing the system memory or optimizing the model to require less memory. Here are the steps:

  • Increase System Memory:
    1. Check if it's possible to add more physical RAM to the system.
    2. If not, consider using a cloud provider that offers more memory options.
  • Optimize Model Memory Usage:
    1. Try using a smaller model, such as "deepseek-r1:16b" or "deepseek-r1:8b".
    2. Implement model pruning or quantization to reduce memory requirements.
  • Configure Ollama to Use GPU:
    1. Ensure that the Nvidia GPU is properly installed and configured.
    2. Set the OLLAMA_GPU environment variable to the ID of the available GPU (e.g., export OLLAMA_GPU=0).
  • Update Ollama Configuration:
    1. Edit the ollama.config file to include the following settings:

memory_limit: 350GiB gpu_id: 0 ``` 2. Restart the Ollama service after updating the configuration.

Example Code

To implement model pruning, you can use the following Python code:

import torch
import torch.nn as nn

# Load the model
model = torch.load("deepseek-r1:32b.pth")

# Prune the model
parameters_to_prune = (
    (model.encoder, 'weight'),
    (model.decoder, 'weight'),
)
torch.nn.utils.prune.global_unstructured(
    parameters_to_prune,
    pruning_method=torch.nn.utils.prune.L1Unstructured,
    amount=0.2,
)

# Save the pruned model
torch.save(model, "pruned_deepseek-r1:32b.pth")

Verification

To verify that the fix worked, run the following command:

curl http://localhost:11434/api/generate -d '{
  "model": "pruned_deepseek-r1:32b",
  "prompt": "Hello, how are you?",
  "stream": false
}'

If the response is successful, it should return a generated text without any memory errors.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING