vllm - 💡(How to fix) Fix [Bug]: vllm does not use all of the available RAM

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fix / Workaround

============================== CPU Info

Architecture: aarch64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: ARM Model name: Cortex-A76 Model: 1 Thread(s) per core: 1 Core(s) per cluster: 4 Socket(s): - Cluster(s): 1 Stepping: r4p1 CPU(s) scaling MHz: 100% CPU max MHz: 2400,0000 CPU min MHz: 1500,0000 BogoMIPS: 108,00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp L1d cache: 256 KiB (4 instances) L1i cache: 256 KiB (4 instances) L2 cache: 2 MiB (4 instances) L3 cache: 2 MiB (1 instance) NUMA node(s): 8 NUMA node0 CPU(s): 0-3 NUMA node1 CPU(s): 0-3 NUMA node2 CPU(s): 0-3 NUMA node3 CPU(s): 0-3 NUMA node4 CPU(s): 0-3 NUMA node5 CPU(s): 0-3 NUMA node6 CPU(s): 0-3 NUMA node7 CPU(s): 0-3 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; __user pointer sanitization Vulnerability Spectre v2: Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected

Code Example

Collecting environment information...
==============================
        System Info
==============================
OS                           : Debian GNU/Linux 13 (trixie) (aarch64)
GCC version                  : (Debian 12.4.0-5) 12.4.0
Clang version                : 19.1.7 (3+b1)
CMake version                : version 4.3.1
Libc version                 : glibc-2.41

==============================
       PyTorch Info
==============================
PyTorch version              : 2.11.0+cpu
Is debug build               : False
CUDA used to build PyTorch   : None
ROCM used to build PyTorch   : N/A
XPU used to build PyTorch    : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.13 (main, Apr 14 2026, 14:26:08) [Clang 22.1.3 ] (64-bit runtime)
Python platform              : Linux-6.12.75+rpt-rpi-2712-aarch64-with-glibc2.41
    

==============================
          CPU Info
==============================
Architecture:                            aarch64
CPU op-mode(s):                          32-bit, 64-bit
Byte Order:                              Little Endian
CPU(s):                                  4
On-line CPU(s) list:                     0-3
Vendor ID:                               ARM
Model name:                              Cortex-A76
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     4
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r4p1
CPU(s) scaling MHz:                      100%
CPU max MHz:                             2400,0000
CPU min MHz:                             1500,0000
BogoMIPS:                                108,00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
L1d cache:                               256 KiB (4 instances)
L1i cache:                               256 KiB (4 instances)
L2 cache:                                2 MiB (4 instances)
L3 cache:                                2 MiB (1 instance)
NUMA node(s):                            8
NUMA node0 CPU(s):                       0-3
NUMA node1 CPU(s):                       0-3
NUMA node2 CPU(s):                       0-3
NUMA node3 CPU(s):                       0-3
NUMA node4 CPU(s):                       0-3
NUMA node5 CPU(s):                       0-3
NUMA node6 CPU(s):                       0-3
NUMA node7 CPU(s):                       0-3
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

==============================
Versions of relevant libraries
==============================
[pip3] numpy==2.2.6
[pip3] pyzmq==27.1.0
[pip3] torch==2.11.0+cpu
[pip3] torchaudio==2.11.0
[pip3] torchvision==0.26.0+cpu
[pip3] transformers==5.5.4
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.19.2rc1.dev29+g58631d7c3.d20260420 (git sha: 58631d7c3, date: 20260420)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled; XPU: Disabled
GPU Topology:
  Could not collect

==============================
     Environment Variables
==============================
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_pi

---

vllm serve mistralai/Voxtral-Mini-4B-Realtime-2602

---

numactl --hardware
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
Collecting environment information...
==============================
        System Info
==============================
OS                           : Debian GNU/Linux 13 (trixie) (aarch64)
GCC version                  : (Debian 12.4.0-5) 12.4.0
Clang version                : 19.1.7 (3+b1)
CMake version                : version 4.3.1
Libc version                 : glibc-2.41

==============================
       PyTorch Info
==============================
PyTorch version              : 2.11.0+cpu
Is debug build               : False
CUDA used to build PyTorch   : None
ROCM used to build PyTorch   : N/A
XPU used to build PyTorch    : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.13 (main, Apr 14 2026, 14:26:08) [Clang 22.1.3 ] (64-bit runtime)
Python platform              : Linux-6.12.75+rpt-rpi-2712-aarch64-with-glibc2.41
    

==============================
          CPU Info
==============================
Architecture:                            aarch64
CPU op-mode(s):                          32-bit, 64-bit
Byte Order:                              Little Endian
CPU(s):                                  4
On-line CPU(s) list:                     0-3
Vendor ID:                               ARM
Model name:                              Cortex-A76
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     4
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r4p1
CPU(s) scaling MHz:                      100%
CPU max MHz:                             2400,0000
CPU min MHz:                             1500,0000
BogoMIPS:                                108,00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
L1d cache:                               256 KiB (4 instances)
L1i cache:                               256 KiB (4 instances)
L2 cache:                                2 MiB (4 instances)
L3 cache:                                2 MiB (1 instance)
NUMA node(s):                            8
NUMA node0 CPU(s):                       0-3
NUMA node1 CPU(s):                       0-3
NUMA node2 CPU(s):                       0-3
NUMA node3 CPU(s):                       0-3
NUMA node4 CPU(s):                       0-3
NUMA node5 CPU(s):                       0-3
NUMA node6 CPU(s):                       0-3
NUMA node7 CPU(s):                       0-3
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

==============================
Versions of relevant libraries
==============================
[pip3] numpy==2.2.6
[pip3] pyzmq==27.1.0
[pip3] torch==2.11.0+cpu
[pip3] torchaudio==2.11.0
[pip3] torchvision==0.26.0+cpu
[pip3] transformers==5.5.4
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.19.2rc1.dev29+g58631d7c3.d20260420 (git sha: 58631d7c3, date: 20260420)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled; XPU: Disabled
GPU Topology:
  Could not collect

==============================
     Environment Variables
==============================
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_pi
</details>

🐛 Describe the bug

On Raspberry PI 5 (4Go), the following command fails due to insufficient memory.

vllm serve mistralai/Voxtral-Mini-4B-Realtime-2602

=> Available memory on node 0 (0.01/0.47 GiB) on startup is less than desired CPU memory utilization (0.9, 0.43 GiB). Decrease --gpu-memory-utilization or reduce CPU memory used by other processes.

vllm_serve_output.txt

Additional information

numactl --hardware

numactl_output.txt

I tried without numa by defining VLLM_NUMA_DISABLED but it does not work too.

extent analysis

TL;DR

The vllm serve command fails due to insufficient memory on the Raspberry PI 5, suggesting a need to adjust memory utilization settings or reduce CPU memory usage.

Guidance

  • Review the --gpu-memory-utilization flag and consider decreasing its value to reduce memory requirements.
  • Investigate and terminate any unnecessary processes consuming CPU memory to free up resources.
  • Examine the numactl --hardware output to better understand the system's NUMA configuration and its potential impact on memory allocation.
  • Consider re-running the command with adjusted settings or environment variables, such as VLLM_NUMA_DISABLED, to observe any changes in behavior.

Example

No specific code example is provided, as the issue revolves around command-line arguments and system configuration.

Notes

The provided information suggests a memory constraint issue, but without access to the vllm_serve_output.txt and numactl_output.txt files, it's challenging to provide a more detailed analysis. The guidance offered is based on the error message and the additional information provided.

Recommendation

Apply workaround: Decrease the --gpu-memory-utilization flag value to reduce memory requirements, as the error message suggests that available memory is insufficient for the desired CPU memory utilization.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING