vllm - 💡(How to fix) Fix [Bug]: installing vllm 0.21 with cu129 torch backend tries to open libcudart.so.13 which is specific to cu13

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/init.py", line 70, in getattr module = import_module(module_name, package) File "/home/data/medical-nlp/atk7/.cache/.uv-python/cpython-3.14.3-linux-x86_64-gnu/lib/python3.14/importlib/init.py", line 88, in import_module return _bootstrap._gcd_import(name[level:], package, level) ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/entrypoints/llm.py", line 21, in <module> from vllm.config import ( ...<6 lines>... ) File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/config/init.py", line 6, in <module> from vllm.config.compilation import ( ...<4 lines>... ) File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/config/compilation.py", line 22, in <module> from vllm.platforms import current_platform File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/platforms/init.py", line 278, in getattr _current_platform = resolve_obj_by_qualname(platform_cls_qualname)() ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^ File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/utils/import_utils.py", line 109, in resolve_obj_by_qualname module = importlib.import_module(module_name) File "/home/data/medical-nlp/atk7/.cache/.uv-python/cpython-3.14.3-linux-x86_64-gnu/lib/python3.14/importlib/init.py", line 88, in import_module return _bootstrap._gcd_import(name[level:], package, level) ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/platforms/cuda.py", line 21, in <module> import vllm._C # noqa ^^^^^^^^^^^^^^ ImportError: libcudart.so.13: cannot open shared object file: No such file or directory

Root Cause

I get this error :

  × No solution found when resolving dependencies:
  ╰─▶ Because all versions of cuda-tile have no wheels with a matching Python ABI tag (e.g., `cp314`) and flashinfer-python==0.6.8.post1 depends on cuda-tile, we can conclude that
      flashinfer-python==0.6.8.post1 cannot be used.
      And because vllm==0.21.0 depends on flashinfer-python==0.6.8.post1 and you require vllm==0.21, we can conclude that your requirements are unsatisfiable.

Code Example

Collecting environment information...
uv is set
==============================
        System Info
==============================
OS                           : Red Hat Enterprise Linux release 8.6 (Ootpa) (x86_64)
GCC version                  : (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10)
Clang version                : Could not collect
CMake version                : version 3.20.2
Libc version                 : glibc-2.28

==============================
       PyTorch Info
==============================
PyTorch version              : 2.11.0+cu129
Is debug build               : False
CUDA used to build PyTorch   : 12.9
ROCM used to build PyTorch   : N/A
XPU used to build PyTorch    : N/A

==============================
      Python Environment
==============================
Python version               : 3.14.3 (main, Mar 24 2026, 22:50:36) [Clang 22.1.1 ] (64-bit runtime)
Python platform              : Linux-4.18.0-553.123.1.el8_10.x86_64-x86_64-with-glibc2.28

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : Could not collect
CUDA_MODULE_LOADING set to   :
GPU models and configuration :
GPU 0: NVIDIA RTX A6000
GPU 1: NVIDIA RTX A6000

Nvidia driver version        : 575.57.08
cuDNN version                : Probably one of the following:
/usr/lib64/libcudnn.so.8.9.2
/usr/lib64/libcudnn.so.9.8.0
/usr/lib64/libcudnn_adv.so.9.8.0
/usr/lib64/libcudnn_adv_infer.so.8.9.2
/usr/lib64/libcudnn_adv_train.so.8.9.2
/usr/lib64/libcudnn_cnn.so.9.8.0
/usr/lib64/libcudnn_cnn_infer.so.8.9.2
/usr/lib64/libcudnn_cnn_train.so.8.9.2
/usr/lib64/libcudnn_engines_precompiled.so.9.8.0
/usr/lib64/libcudnn_engines_runtime_compiled.so.9.8.0
/usr/lib64/libcudnn_graph.so.9.8.0
/usr/lib64/libcudnn_heuristic.so.9.8.0
/usr/lib64/libcudnn_ops.so.9.8.0
/usr/lib64/libcudnn_ops_infer.so.8.9.2
/usr/lib64/libcudnn_ops_train.so.8.9.2
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  2
Core(s) per socket:  24
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz
Stepping:            7
CPU MHz:             3040.743
CPU max MHz:         4000.0000
CPU min MHz:         1000.0000
BogoMIPS:            4400.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            36608K
NUMA node0 CPU(s):   0-23,48-71
NUMA node1 CPU(s):   24-47,72-95
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.6.8.post1
[pip3] numpy==2.4.6
[pip3] nvidia-cublas-cu12==12.9.1.4
[pip3] nvidia-cuda-cupti-cu12==12.9.79
[pip3] nvidia-cuda-nvrtc-cu12==12.9.86
[pip3] nvidia-cuda-runtime-cu12==12.9.79
[pip3] nvidia-cudnn-cu12==9.17.1.4
[pip3] nvidia-cudnn-frontend==1.18.0
[pip3] nvidia-cufft-cu12==11.4.1.4
[pip3] nvidia-cufile-cu12==1.14.1.1
[pip3] nvidia-curand-cu12==10.3.10.19
[pip3] nvidia-cusolver-cu12==11.7.5.82
[pip3] nvidia-cusparse-cu12==12.5.10.65
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.4.2
[pip3] nvidia-cutlass-dsl-libs-base==4.4.2
[pip3] nvidia-ml-py==13.595.45
[pip3] nvidia-nccl-cu12==2.28.9
[pip3] nvidia-nvjitlink-cu12==12.9.86
[pip3] nvidia-nvshmem-cu12==3.4.5
[pip3] nvidia-nvtx-cu12==12.9.79
[pip3] pyzmq==27.1.0
[pip3] tokenspeed-triton==3.7.10.post20260505
[pip3] torch==2.11.0+cu129
[pip3] torch-c-dlpack-ext==0.1.5
[pip3] torchaudio==2.11.0+cu129
[pip3] torchvision==0.26.0+cu129
[pip3] transformers==5.9.0
[pip3] triton==3.6.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.21.0
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled; XPU: Disabled
GPU Topology:
        GPU0    GPU1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV4     0-23,48-71      0               N/A
GPU1    NV4      X      0-23,48-71      0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

==============================
     Environment Variables
==============================
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_atk7

---

uv pip install vllm==0.21 --torch-backend=cu129

---

× No solution found when resolving dependencies:
  ╰─▶ Because all versions of cuda-tile have no wheels with a matching Python ABI tag (e.g., `cp314`) and flashinfer-python==0.6.8.post1 depends on cuda-tile, we can conclude that
      flashinfer-python==0.6.8.post1 cannot be used.
      And because vllm==0.21.0 depends on flashinfer-python==0.6.8.post1 and you require vllm==0.21, we can conclude that your requirements are unsatisfiable.

      hint: Pre-releases are available for `cuda-tile` in the requested range (e.g., 1.0.0rc6), but pre-releases weren't enabled (try: `--prerelease=allow`)

      hint: You require CPython 3.14 (`cp314`), but we only found wheels for `cuda-tile` (v1.3.0) with the following Python ABI tags: `cp310`, `cp311`, `cp312`, `cp313`

---

uv pip install vllm==0.21 --torch-backend=cu129 --prerelease=allow

---

uv run python - <<'PY'
from vllm import LLM, SamplingParams
print("vLLM OK")
PY

---

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/__init__.py", line 70, in __getattr__
    module = import_module(module_name, __package__)
  File "/home/data/medical-nlp/atk7/.cache/.uv-python/cpython-3.14.3-linux-x86_64-gnu/lib/python3.14/importlib/__init__.py", line 88, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/entrypoints/llm.py", line 21, in <module>
    from vllm.config import (
    ...<6 lines>...
    )
  File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/config/__init__.py", line 6, in <module>
    from vllm.config.compilation import (
    ...<4 lines>...
    )
  File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/config/compilation.py", line 22, in <module>
    from vllm.platforms import current_platform
  File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/platforms/__init__.py", line 278, in __getattr__
    _current_platform = resolve_obj_by_qualname(platform_cls_qualname)()
                        ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/utils/import_utils.py", line 109, in resolve_obj_by_qualname
    module = importlib.import_module(module_name)
  File "/home/data/medical-nlp/atk7/.cache/.uv-python/cpython-3.14.3-linux-x86_64-gnu/lib/python3.14/importlib/__init__.py", line 88, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/platforms/cuda.py", line 21, in <module>
    import vllm._C  # noqa
    ^^^^^^^^^^^^^^
ImportError: libcudart.so.13: cannot open shared object file: No such file or directory

---

Collecting environment information...
uv is set
==============================
        System Info
==============================
OS                           : Red Hat Enterprise Linux release 8.6 (Ootpa) (x86_64)
GCC version                  : (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10)
Clang version                : Could not collect
CMake version                : version 3.20.2
Libc version                 : glibc-2.28

==============================
       PyTorch Info
==============================
PyTorch version              : 2.11.0+cu129
Is debug build               : False
CUDA used to build PyTorch   : 12.9
ROCM used to build PyTorch   : N/A
XPU used to build PyTorch    : N/A

==============================
      Python Environment
==============================
Python version               : 3.13.12 (main, Mar 24 2026, 22:49:35) [Clang 22.1.1 ] (64-bit runtime)
Python platform              : Linux-4.18.0-553.123.1.el8_10.x86_64-x86_64-with-glibc2.28

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : Could not collect
CUDA_MODULE_LOADING set to   :
GPU models and configuration :
GPU 0: NVIDIA RTX A6000
GPU 1: NVIDIA RTX A6000

Nvidia driver version        : 575.57.08
cuDNN version                : Probably one of the following:
/usr/lib64/libcudnn.so.8.9.2
/usr/lib64/libcudnn.so.9.8.0
/usr/lib64/libcudnn_adv.so.9.8.0
/usr/lib64/libcudnn_adv_infer.so.8.9.2
/usr/lib64/libcudnn_adv_train.so.8.9.2
/usr/lib64/libcudnn_cnn.so.9.8.0
/usr/lib64/libcudnn_cnn_infer.so.8.9.2
/usr/lib64/libcudnn_cnn_train.so.8.9.2
/usr/lib64/libcudnn_engines_precompiled.so.9.8.0
/usr/lib64/libcudnn_engines_runtime_compiled.so.9.8.0
/usr/lib64/libcudnn_graph.so.9.8.0
/usr/lib64/libcudnn_heuristic.so.9.8.0
/usr/lib64/libcudnn_ops.so.9.8.0
/usr/lib64/libcudnn_ops_infer.so.8.9.2
/usr/lib64/libcudnn_ops_train.so.8.9.2
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  2
Core(s) per socket:  24
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz
Stepping:            7
CPU MHz:             2200.000
CPU max MHz:         4000.0000
CPU min MHz:         1000.0000
BogoMIPS:            4400.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            36608K
NUMA node0 CPU(s):   0-23,48-71
NUMA node1 CPU(s):   24-47,72-95
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.6.8.post1
[pip3] numpy==2.4.6
[pip3] nvidia-cublas-cu12==12.9.1.4
[pip3] nvidia-cuda-cupti-cu12==12.9.79
[pip3] nvidia-cuda-nvrtc-cu12==12.9.86
[pip3] nvidia-cuda-runtime-cu12==12.9.79
[pip3] nvidia-cudnn-cu12==9.17.1.4
[pip3] nvidia-cudnn-frontend==1.18.0
[pip3] nvidia-cufft-cu12==11.4.1.4
[pip3] nvidia-cufile-cu12==1.14.1.1
[pip3] nvidia-curand-cu12==10.3.10.19
[pip3] nvidia-cusolver-cu12==11.7.5.82
[pip3] nvidia-cusparse-cu12==12.5.10.65
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.4.2
[pip3] nvidia-cutlass-dsl-libs-base==4.4.2
[pip3] nvidia-ml-py==13.595.45
[pip3] nvidia-nccl-cu12==2.28.9
[pip3] nvidia-nvjitlink-cu12==12.9.86
[pip3] nvidia-nvshmem-cu12==3.4.5
[pip3] nvidia-nvtx-cu12==12.9.79
[pip3] pyzmq==27.1.0
[pip3] tokenspeed-triton==3.7.10.post20260505
[pip3] torch==2.11.0+cu129
[pip3] torch-c-dlpack-ext==0.1.5
[pip3] torchaudio==2.11.0+cu129
[pip3] torchvision==0.26.0+cu129
[pip3] transformers==5.9.0
[pip3] triton==3.6.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.21.0
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled; XPU: Disabled
GPU Topology:
        GPU0    GPU1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV4     0-23,48-71      0               N/A
GPU1    NV4      X      0-23,48-71      0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

==============================
     Environment Variables
==============================
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_atk7
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
Collecting environment information...
uv is set
==============================
        System Info
==============================
OS                           : Red Hat Enterprise Linux release 8.6 (Ootpa) (x86_64)
GCC version                  : (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10)
Clang version                : Could not collect
CMake version                : version 3.20.2
Libc version                 : glibc-2.28

==============================
       PyTorch Info
==============================
PyTorch version              : 2.11.0+cu129
Is debug build               : False
CUDA used to build PyTorch   : 12.9
ROCM used to build PyTorch   : N/A
XPU used to build PyTorch    : N/A

==============================
      Python Environment
==============================
Python version               : 3.14.3 (main, Mar 24 2026, 22:50:36) [Clang 22.1.1 ] (64-bit runtime)
Python platform              : Linux-4.18.0-553.123.1.el8_10.x86_64-x86_64-with-glibc2.28

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : Could not collect
CUDA_MODULE_LOADING set to   :
GPU models and configuration :
GPU 0: NVIDIA RTX A6000
GPU 1: NVIDIA RTX A6000

Nvidia driver version        : 575.57.08
cuDNN version                : Probably one of the following:
/usr/lib64/libcudnn.so.8.9.2
/usr/lib64/libcudnn.so.9.8.0
/usr/lib64/libcudnn_adv.so.9.8.0
/usr/lib64/libcudnn_adv_infer.so.8.9.2
/usr/lib64/libcudnn_adv_train.so.8.9.2
/usr/lib64/libcudnn_cnn.so.9.8.0
/usr/lib64/libcudnn_cnn_infer.so.8.9.2
/usr/lib64/libcudnn_cnn_train.so.8.9.2
/usr/lib64/libcudnn_engines_precompiled.so.9.8.0
/usr/lib64/libcudnn_engines_runtime_compiled.so.9.8.0
/usr/lib64/libcudnn_graph.so.9.8.0
/usr/lib64/libcudnn_heuristic.so.9.8.0
/usr/lib64/libcudnn_ops.so.9.8.0
/usr/lib64/libcudnn_ops_infer.so.8.9.2
/usr/lib64/libcudnn_ops_train.so.8.9.2
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  2
Core(s) per socket:  24
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz
Stepping:            7
CPU MHz:             3040.743
CPU max MHz:         4000.0000
CPU min MHz:         1000.0000
BogoMIPS:            4400.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            36608K
NUMA node0 CPU(s):   0-23,48-71
NUMA node1 CPU(s):   24-47,72-95
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.6.8.post1
[pip3] numpy==2.4.6
[pip3] nvidia-cublas-cu12==12.9.1.4
[pip3] nvidia-cuda-cupti-cu12==12.9.79
[pip3] nvidia-cuda-nvrtc-cu12==12.9.86
[pip3] nvidia-cuda-runtime-cu12==12.9.79
[pip3] nvidia-cudnn-cu12==9.17.1.4
[pip3] nvidia-cudnn-frontend==1.18.0
[pip3] nvidia-cufft-cu12==11.4.1.4
[pip3] nvidia-cufile-cu12==1.14.1.1
[pip3] nvidia-curand-cu12==10.3.10.19
[pip3] nvidia-cusolver-cu12==11.7.5.82
[pip3] nvidia-cusparse-cu12==12.5.10.65
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.4.2
[pip3] nvidia-cutlass-dsl-libs-base==4.4.2
[pip3] nvidia-ml-py==13.595.45
[pip3] nvidia-nccl-cu12==2.28.9
[pip3] nvidia-nvjitlink-cu12==12.9.86
[pip3] nvidia-nvshmem-cu12==3.4.5
[pip3] nvidia-nvtx-cu12==12.9.79
[pip3] pyzmq==27.1.0
[pip3] tokenspeed-triton==3.7.10.post20260505
[pip3] torch==2.11.0+cu129
[pip3] torch-c-dlpack-ext==0.1.5
[pip3] torchaudio==2.11.0+cu129
[pip3] torchvision==0.26.0+cu129
[pip3] transformers==5.9.0
[pip3] triton==3.6.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.21.0
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled; XPU: Disabled
GPU Topology:
        GPU0    GPU1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV4     0-23,48-71      0               N/A
GPU1    NV4      X      0-23,48-71      0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

==============================
     Environment Variables
==============================
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_atk7
</details>

🐛 Describe the bug

When trying to install vllm using the cu129 torch backend on python 3.14, with this command:

uv pip install vllm==0.21 --torch-backend=cu129

I get this error :

  × No solution found when resolving dependencies:
  ╰─▶ Because all versions of cuda-tile have no wheels with a matching Python ABI tag (e.g., `cp314`) and flashinfer-python==0.6.8.post1 depends on cuda-tile, we can conclude that
      flashinfer-python==0.6.8.post1 cannot be used.
      And because vllm==0.21.0 depends on flashinfer-python==0.6.8.post1 and you require vllm==0.21, we can conclude that your requirements are unsatisfiable.

      hint: Pre-releases are available for `cuda-tile` in the requested range (e.g., 1.0.0rc6), but pre-releases weren't enabled (try: `--prerelease=allow`)

      hint: You require CPython 3.14 (`cp314`), but we only found wheels for `cuda-tile` (v1.3.0) with the following Python ABI tags: `cp310`, `cp311`, `cp312`, `cp313`

When I allow prereleases (the setup is copied above) using this command

uv pip install vllm==0.21 --torch-backend=cu129 --prerelease=allow

It installed correctly, but when I try to test it using this

uv run python - <<'PY'
from vllm import LLM, SamplingParams
print("vLLM OK")
PY

I get this error where vLLM tries to use libcudart.so.13, which is specific to cuda 13, and not 12.9 that I installed :

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/__init__.py", line 70, in __getattr__
    module = import_module(module_name, __package__)
  File "/home/data/medical-nlp/atk7/.cache/.uv-python/cpython-3.14.3-linux-x86_64-gnu/lib/python3.14/importlib/__init__.py", line 88, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/entrypoints/llm.py", line 21, in <module>
    from vllm.config import (
    ...<6 lines>...
    )
  File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/config/__init__.py", line 6, in <module>
    from vllm.config.compilation import (
    ...<4 lines>...
    )
  File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/config/compilation.py", line 22, in <module>
    from vllm.platforms import current_platform
  File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/platforms/__init__.py", line 278, in __getattr__
    _current_platform = resolve_obj_by_qualname(platform_cls_qualname)()
                        ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/utils/import_utils.py", line 109, in resolve_obj_by_qualname
    module = importlib.import_module(module_name)
  File "/home/data/medical-nlp/atk7/.cache/.uv-python/cpython-3.14.3-linux-x86_64-gnu/lib/python3.14/importlib/__init__.py", line 88, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/data/medical-nlp/atk7/test/.venv/lib/python3.14/site-packages/vllm/platforms/cuda.py", line 21, in <module>
    import vllm._C  # noqa
    ^^^^^^^^^^^^^^
ImportError: libcudart.so.13: cannot open shared object file: No such file or directory

Note that this bug is also present on Python 3.13 (where I don't need to use --prerelease=allow to install vllm 0.21 with the cu129 torch backend)

<details> <summary>The output of <code>python collect_env.py</code> with python 3.13</summary>
Collecting environment information...
uv is set
==============================
        System Info
==============================
OS                           : Red Hat Enterprise Linux release 8.6 (Ootpa) (x86_64)
GCC version                  : (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10)
Clang version                : Could not collect
CMake version                : version 3.20.2
Libc version                 : glibc-2.28

==============================
       PyTorch Info
==============================
PyTorch version              : 2.11.0+cu129
Is debug build               : False
CUDA used to build PyTorch   : 12.9
ROCM used to build PyTorch   : N/A
XPU used to build PyTorch    : N/A

==============================
      Python Environment
==============================
Python version               : 3.13.12 (main, Mar 24 2026, 22:49:35) [Clang 22.1.1 ] (64-bit runtime)
Python platform              : Linux-4.18.0-553.123.1.el8_10.x86_64-x86_64-with-glibc2.28

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : Could not collect
CUDA_MODULE_LOADING set to   :
GPU models and configuration :
GPU 0: NVIDIA RTX A6000
GPU 1: NVIDIA RTX A6000

Nvidia driver version        : 575.57.08
cuDNN version                : Probably one of the following:
/usr/lib64/libcudnn.so.8.9.2
/usr/lib64/libcudnn.so.9.8.0
/usr/lib64/libcudnn_adv.so.9.8.0
/usr/lib64/libcudnn_adv_infer.so.8.9.2
/usr/lib64/libcudnn_adv_train.so.8.9.2
/usr/lib64/libcudnn_cnn.so.9.8.0
/usr/lib64/libcudnn_cnn_infer.so.8.9.2
/usr/lib64/libcudnn_cnn_train.so.8.9.2
/usr/lib64/libcudnn_engines_precompiled.so.9.8.0
/usr/lib64/libcudnn_engines_runtime_compiled.so.9.8.0
/usr/lib64/libcudnn_graph.so.9.8.0
/usr/lib64/libcudnn_heuristic.so.9.8.0
/usr/lib64/libcudnn_ops.so.9.8.0
/usr/lib64/libcudnn_ops_infer.so.8.9.2
/usr/lib64/libcudnn_ops_train.so.8.9.2
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  2
Core(s) per socket:  24
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz
Stepping:            7
CPU MHz:             2200.000
CPU max MHz:         4000.0000
CPU min MHz:         1000.0000
BogoMIPS:            4400.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            36608K
NUMA node0 CPU(s):   0-23,48-71
NUMA node1 CPU(s):   24-47,72-95
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.6.8.post1
[pip3] numpy==2.4.6
[pip3] nvidia-cublas-cu12==12.9.1.4
[pip3] nvidia-cuda-cupti-cu12==12.9.79
[pip3] nvidia-cuda-nvrtc-cu12==12.9.86
[pip3] nvidia-cuda-runtime-cu12==12.9.79
[pip3] nvidia-cudnn-cu12==9.17.1.4
[pip3] nvidia-cudnn-frontend==1.18.0
[pip3] nvidia-cufft-cu12==11.4.1.4
[pip3] nvidia-cufile-cu12==1.14.1.1
[pip3] nvidia-curand-cu12==10.3.10.19
[pip3] nvidia-cusolver-cu12==11.7.5.82
[pip3] nvidia-cusparse-cu12==12.5.10.65
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.4.2
[pip3] nvidia-cutlass-dsl-libs-base==4.4.2
[pip3] nvidia-ml-py==13.595.45
[pip3] nvidia-nccl-cu12==2.28.9
[pip3] nvidia-nvjitlink-cu12==12.9.86
[pip3] nvidia-nvshmem-cu12==3.4.5
[pip3] nvidia-nvtx-cu12==12.9.79
[pip3] pyzmq==27.1.0
[pip3] tokenspeed-triton==3.7.10.post20260505
[pip3] torch==2.11.0+cu129
[pip3] torch-c-dlpack-ext==0.1.5
[pip3] torchaudio==2.11.0+cu129
[pip3] torchvision==0.26.0+cu129
[pip3] transformers==5.9.0
[pip3] triton==3.6.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.21.0
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled; XPU: Disabled
GPU Topology:
        GPU0    GPU1    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV4     0-23,48-71      0               N/A
GPU1    NV4      X      0-23,48-71      0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

==============================
     Environment Variables
==============================
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_atk7
</details>

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: installing vllm 0.21 with cu129 torch backend tries to open libcudart.so.13 which is specific to cu13