vllm - 💡(How to fix) Fix [Bug]: CUBLAS_STATUS_INVALID_VALUE in Docker due to LD_LIBRARY_PATH cuBLAS version conflict [1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#36663Fetched 2026-04-08 00:35:31
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
cross-referenced ×1

Serving models with quantization (e.g., --quantization fp8) in the official vLLM Docker image fails during profiling with CUBLAS_STATUS_INVALID_VALUE. The root cause is an LD_LIBRARY_PATH setting in the Dockerfile that causes a mismatched system cuBLAS to shadow PyTorch's bundled cuBLAS.

Error Message

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling
cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda,
b, CUDA_R_16BF, ldb, &fbeta, c, std::is_same_v<C_Dtype, float> ? CUDA_R_32F
: CUDA_R_16BF, ldc, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

The error occurs during profile_run()_dummy_sampler_run()compute_logits()lm_head unquantized GEMM (torch.nn.functional.linear). The fp8-quantized linear layers (smaller dimensions) work fine; only the large lm_head matmul (hidden_size=3584 × vocab_size=152064) triggers the error.

Root Cause

The vLLM v0.17.0 Dockerfile sets:

ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/local/nvidia/lib64:${LD_LIBRARY_PATH}

This causes the system cuBLAS (from CUDA 12.9.1 installed in the container) to shadow PyTorch's bundled nvidia-cublas-cu12 at runtime. The version mismatch causes certain GEMM parameter combinations to be rejected by cuBLAS — specifically large matmuls like the lm_head projection — while smaller GEMMs happen to use cuBLAS code paths that are unaffected.

Fix Action

Workaround

unset LD_LIBRARY_PATH
vllm serve ByteDance-Seed/BAGEL-7B-MoT --port 8091 --quantization fp8

This allows the dynamic linker to find PyTorch's bundled cuBLAS (matching version) instead of the system one.

Code Example

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling
cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda,
b, CUDA_R_16BF, ldb, &fbeta, c, std::is_same_v<C_Dtype, float> ? CUDA_R_32F
: CUDA_R_16BF, ldc, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

---

ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/local/nvidia/lib64:${LD_LIBRARY_PATH}

---

unset LD_LIBRARY_PATH
vllm serve ByteDance-Seed/BAGEL-7B-MoT --port 8091 --quantization fp8
RAW_BUFFERClick to expand / collapse

Summary

Serving models with quantization (e.g., --quantization fp8) in the official vLLM Docker image fails during profiling with CUBLAS_STATUS_INVALID_VALUE. The root cause is an LD_LIBRARY_PATH setting in the Dockerfile that causes a mismatched system cuBLAS to shadow PyTorch's bundled cuBLAS.

Environment

  • Docker image: vllm/vllm-openai:v0.17.0
  • Model: ByteDance-Seed/BAGEL-7B-MoT (also reported with Qwen3.5-122B in #35608)
  • Command: vllm serve ByteDance-Seed/BAGEL-7B-MoT --port 8091 --quantization fp8

Error

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling
cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda,
b, CUDA_R_16BF, ldb, &fbeta, c, std::is_same_v<C_Dtype, float> ? CUDA_R_32F
: CUDA_R_16BF, ldc, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

The error occurs during profile_run()_dummy_sampler_run()compute_logits()lm_head unquantized GEMM (torch.nn.functional.linear). The fp8-quantized linear layers (smaller dimensions) work fine; only the large lm_head matmul (hidden_size=3584 × vocab_size=152064) triggers the error.

Root Cause

The vLLM v0.17.0 Dockerfile sets:

ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/local/nvidia/lib64:${LD_LIBRARY_PATH}

This causes the system cuBLAS (from CUDA 12.9.1 installed in the container) to shadow PyTorch's bundled nvidia-cublas-cu12 at runtime. The version mismatch causes certain GEMM parameter combinations to be rejected by cuBLAS — specifically large matmuls like the lm_head projection — while smaller GEMMs happen to use cuBLAS code paths that are unaffected.

Workaround

unset LD_LIBRARY_PATH
vllm serve ByteDance-Seed/BAGEL-7B-MoT --port 8091 --quantization fp8

This allows the dynamic linker to find PyTorch's bundled cuBLAS (matching version) instead of the system one.

Suggested Fix

Ensure the Dockerfile does not set LD_LIBRARY_PATH in a way that shadows PyTorch's bundled CUDA libraries, or sanitize it at container entrypoint time.

Related Issues

  • #35608 — Same error with Qwen3.5-122B, same unset LD_LIBRARY_PATH fix
  • #35028 — Same CUBLAS_STATUS_INVALID_VALUE error
  • pytorch/pytorch#174949 — cuBLAS version mismatch after PyTorch 2.10 upgrade

extent analysis

Fix Plan

To resolve the CUBLAS_STATUS_INVALID_VALUE error caused by a version mismatch between the system cuBLAS and PyTorch's bundled cuBLAS, follow these steps:

  • Modify the Dockerfile: Ensure that the LD_LIBRARY_PATH does not shadow PyTorch's bundled CUDA libraries. This can be achieved by removing or modifying the lines that set LD_LIBRARY_PATH to include system CUDA paths.
  • Sanitize LD_LIBRARY_PATH at container entrypoint time: Alternatively, you can unset LD_LIBRARY_PATH at the beginning of the container's entrypoint script to prevent it from interfering with PyTorch's bundled libraries.

Example Dockerfile modification:

# Remove these lines to prevent shadowing PyTorch's cuBLAS
# ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
# ENV LD_LIBRARY_PATH=/usr/local/nvidia/lib64:${LD_LIBRARY_PATH}

Alternatively, sanitize LD_LIBRARY_PATH in the entrypoint script:

#!/bin/bash
unset LD_LIBRARY_PATH
vllm serve ByteDance-Seed/BAGEL-7B-MoT --port 8091 --quantization fp8

Verification

To verify that the fix worked, run the vllm serve command with the --quantization fp8 option and check for the absence of the CUBLAS_STATUS_INVALID_VALUE error.

Extra Tips

  • Ensure that the PyTorch version is compatible with the CUDA version installed in the container.
  • Be cautious when setting LD_LIBRARY_PATH in Dockerfiles, as it can lead to version mismatches and other issues.
  • Consider using a more robust method to manage library paths, such as using ldconfig or LD_PRELOAD, to avoid similar issues in the future.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING