ollama - ✅(Solved) Fix False "insufficient memory" error in LXC: Ollama appears to use MemFree instead of MemAvailable when loading models [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15704Fetched 2026-04-20 11:59:13
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
1
Author
Participants
Timeline (top)
cross-referenced ×1labeled ×1subscribed ×1

Error Message

Apr 19 17:32:08 ollama ollama[199195]: time=2026-04-19T17:32:08.568+02:00 level=WARN source=server.go:1058 msg="model request too large for system" requested="18.3 GiB" available="12.6 GiB" total="32.0 GiB" free="12.1 GiB" swap="506.2 MiB" Apr 19 17:32:08 ollama ollama[199195]: time=2026-04-19T17:32:08.568+02:00 level=INFO source=sched.go:511 msg="Load failed" model=/usr/share/ollama/.ollama/models/blobs/sha256-7121486771cbfe218851513210c40b35dbdee93ab1ef43fe36283c883980f0df error="model requires more system memory (18.3 GiB) than is available (12.6 GiB)"

Root Cause

Notes:

  • Restarting only the ollama service does not help
  • Restarting the whole container may temporarily help because cache is cleared
  • This looks especially relevant for containerized environments such as LXC

Fix Action

Fixed

PR fix notes

PR #15713: fix: use MemAvailable equivalent in cgroup memory check

Description (problem / solution / changelog)

In LXC and Docker containers, memory.current includes reclaimable page cache, so computing free memory as memory.max - memory.current was giving a result closer to MemFree than MemAvailable, causing false "insufficient memory" errors even when Linux could reclaim enough cache to load the model.

The fix reads inactive_file from /sys/fs/cgroup/memory.stat and subtracts it from the used total before computing free memory, matching how the kernel calculates MemAvailable. Fixes #15704

Changed files

  • discover/cpu_linux.go (modified, +23/-1)
  • discover/cpu_linux_test.go (modified, +55/-0)

Code Example

Apr 19 17:31:46 ollama ollama[199195]: time=2026-04-19T17:31:46.564+02:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="32.0 GiB" available="12.1 GiB"
Apr 19 17:32:08 ollama ollama[199195]: time=2026-04-19T17:32:08.309+02:00 level=INFO source=sched.go:484 msg="system memory" total="32.0 GiB" free="12.1 GiB" free_swap="506.2 MiB"
Apr 19 17:32:08 ollama ollama[199195]: time=2026-04-19T17:32:08.568+02:00 level=WARN source=server.go:1058 msg="model request too large for system" requested="18.3 GiB" available="12.6 GiB" total="32.0 GiB" free="12.1 GiB" swap="506.2 MiB"
Apr 19 17:32:08 ollama ollama[199195]: time=2026-04-19T17:32:08.568+02:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="17.3 GiB"
Apr 19 17:32:08 ollama ollama[199195]: time=2026-04-19T17:32:08.568+02:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="880.0 MiB"
Apr 19 17:32:08 ollama ollama[199195]: time=2026-04-19T17:32:08.568+02:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="133.7 MiB"
Apr 19 17:32:08 ollama ollama[199195]: time=2026-04-19T17:32:08.568+02:00 level=INFO source=device.go:272 msg="total memory" size="18.3 GiB"
Apr 19 17:32:08 ollama ollama[199195]: time=2026-04-19T17:32:08.568+02:00 level=INFO source=sched.go:511 msg="Load failed" model=/usr/share/ollama/.ollama/models/blobs/sha256-7121486771cbfe218851513210c40b35dbdee93ab1ef43fe36283c883980f0df error="model requires more system memory (18.3 GiB) than is available (12.6 GiB)"
Apr 19 17:32:08 ollama ollama[199195]: [GIN] 2026/04/19 - 17:32:08 | 500 | 595.882098ms | 192.168.1.239 | POST "/api/chat"

Inside the container:

free -h
---------------
               total        used        free      shared  buff/cache   available
Mem:            32Gi        44Mi        12Gi        72Ki        19Gi        31Gi
Swap:          512Mi       5.8Mi       506Mi

/proc/meminfo
---------------
MemTotal:       33554432 kB
MemFree:        12726260 kB
MemAvailable:   33508662 kB
SwapTotal:        524288 kB
SwapFree:         518352 kB

/sys/fs/cgroup/memory.stat
---------------
anon 36884480
file 21266915328
inactive_file 21189849088
active_file 77045760
RAW_BUFFERClick to expand / collapse

What is the issue?

Ollama refuses to load a model in an LXC container even though the container has enough reclaimable memory available.

Environment:

  • Proxmox LXC container
  • Debian guest
  • Container memory: 32 GiB
  • Container swap: 512 MiB
  • CPU inference only

Observed behavior: When I try to load gemma4:26b, Ollama returns:

model requires more system memory (18.3 GiB) than is available (12.6 GiB)

However, inside the container:

  • MemFree is about 12 GiB
  • MemAvailable is about 31.9 GiB
  • almost no RAM is used by processes
  • about 21 GiB is file cache, mostly inactive_file, so it should be reclaimable by Linux

This suggests Ollama is using something close to MemFree + SwapFree for its pre-load memory check, while ignoring reclaimable page cache / MemAvailable.

Why I believe this is incorrect:

  • free -h shows:
    • free: ~12 GiB
    • available: ~31 GiB
  • /sys/fs/cgroup/memory.stat shows:
    • anon: ~35 MiB
    • file: ~21.2 GiB
    • inactive_file: ~21.1 GiB
  • ps aux --sort=-%mem shows no large memory-consuming processes

So the memory is not actually occupied by applications. It is mostly page cache.

Expected behavior: Ollama should either:

  1. use MemAvailable instead of only MemFree, or
  2. account for reclaimable file cache, or
  3. provide an override / more aggressive loading option

Actual behavior: Ollama refuses to load the model even though Linux should be able to reclaim enough cache to make room.

Notes:

  • Restarting only the ollama service does not help
  • Restarting the whole container may temporarily help because cache is cleared
  • This looks especially relevant for containerized environments such as LXC

Relevant log output

Apr 19 17:31:46 ollama ollama[199195]: time=2026-04-19T17:31:46.564+02:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="32.0 GiB" available="12.1 GiB"
Apr 19 17:32:08 ollama ollama[199195]: time=2026-04-19T17:32:08.309+02:00 level=INFO source=sched.go:484 msg="system memory" total="32.0 GiB" free="12.1 GiB" free_swap="506.2 MiB"
Apr 19 17:32:08 ollama ollama[199195]: time=2026-04-19T17:32:08.568+02:00 level=WARN source=server.go:1058 msg="model request too large for system" requested="18.3 GiB" available="12.6 GiB" total="32.0 GiB" free="12.1 GiB" swap="506.2 MiB"
Apr 19 17:32:08 ollama ollama[199195]: time=2026-04-19T17:32:08.568+02:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="17.3 GiB"
Apr 19 17:32:08 ollama ollama[199195]: time=2026-04-19T17:32:08.568+02:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="880.0 MiB"
Apr 19 17:32:08 ollama ollama[199195]: time=2026-04-19T17:32:08.568+02:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="133.7 MiB"
Apr 19 17:32:08 ollama ollama[199195]: time=2026-04-19T17:32:08.568+02:00 level=INFO source=device.go:272 msg="total memory" size="18.3 GiB"
Apr 19 17:32:08 ollama ollama[199195]: time=2026-04-19T17:32:08.568+02:00 level=INFO source=sched.go:511 msg="Load failed" model=/usr/share/ollama/.ollama/models/blobs/sha256-7121486771cbfe218851513210c40b35dbdee93ab1ef43fe36283c883980f0df error="model requires more system memory (18.3 GiB) than is available (12.6 GiB)"
Apr 19 17:32:08 ollama ollama[199195]: [GIN] 2026/04/19 - 17:32:08 | 500 | 595.882098ms | 192.168.1.239 | POST "/api/chat"

Inside the container:

free -h
---------------
               total        used        free      shared  buff/cache   available
Mem:            32Gi        44Mi        12Gi        72Ki        19Gi        31Gi
Swap:          512Mi       5.8Mi       506Mi

/proc/meminfo
---------------
MemTotal:       33554432 kB
MemFree:        12726260 kB
MemAvailable:   33508662 kB
SwapTotal:        524288 kB
SwapFree:         518352 kB

/sys/fs/cgroup/memory.stat
---------------
anon 36884480
file 21266915328
inactive_file 21189849088
active_file 77045760

OS

Linux

GPU

No response

CPU

Intel

Ollama version

0.20.5

extent analysis

TL;DR

Ollama's memory check seems to ignore reclaimable page cache, causing it to refuse loading a model despite sufficient available memory.

Guidance

  1. Verify memory usage: Confirm that the issue persists after running echo 3 > /proc/sys/vm/drop_caches to clear page cache and observe if Ollama can load the model afterward.
  2. Check Ollama configuration: Investigate if there's a configuration option or environment variable that can adjust Ollama's memory checking behavior to account for reclaimable memory.
  3. Monitor system memory: Use tools like sysdig or systemd-cgtop to monitor memory usage and cgroup limits in real-time, ensuring no other processes are consuming memory unexpectedly.
  4. Consider a temporary workaround: If the model can fit into memory after clearing the cache, consider implementing a script to periodically clear the cache before loading the model, though this is not a permanent solution.

Example

No specific code example is provided due to the nature of the issue, which seems related to how Ollama interprets system memory availability rather than a code snippet that can be directly applied.

Notes

The behavior observed suggests a potential issue with how Ollama calculates available memory, specifically its consideration of reclaimable page cache. This might be a known issue in version 0.20.5, and checking for updates or filing a bug report might be necessary.

Recommendation

Apply a workaround, such as periodically clearing the page cache before attempting to load the model, until a more permanent solution or update to Ollama that correctly accounts for reclaimable memory is available.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING