vllm - 💡(How to fix) Fix [Bug]: CUBLAS_STATUS_INVALID_VALUE in Docker due to LD_LIBRARY_PATH cuBLAS version conflict [1 participants]

vllm2026-03-10 14:57:12

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#36663•Fetched 2026-04-08 00:35:31

View on GitHub

Comments

Participants

Timeline

Reactions

Author

lishunyang12

Participants

lishunyang12

Timeline (top)

cross-referenced ×1

Serving models with quantization (e.g., --quantization fp8) in the official vLLM Docker image fails during profiling with CUBLAS_STATUS_INVALID_VALUE. The root cause is an LD_LIBRARY_PATH setting in the Dockerfile that causes a mismatched system cuBLAS to shadow PyTorch's bundled cuBLAS.

Error Message

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling
cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda,
b, CUDA_R_16BF, ldb, &fbeta, c, std::is_same_v<C_Dtype, float> ? CUDA_R_32F
: CUDA_R_16BF, ldc, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

The error occurs during profile_run() → _dummy_sampler_run() → compute_logits() → lm_head unquantized GEMM (torch.nn.functional.linear). The fp8-quantized linear layers (smaller dimensions) work fine; only the large lm_head matmul (hidden_size=3584 × vocab_size=152064) triggers the error.

Root Cause

The vLLM v0.17.0 Dockerfile sets:

ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/local/nvidia/lib64:${LD_LIBRARY_PATH}

This causes the system cuBLAS (from CUDA 12.9.1 installed in the container) to shadow PyTorch's bundled nvidia-cublas-cu12 at runtime. The version mismatch causes certain GEMM parameter combinations to be rejected by cuBLAS — specifically large matmuls like the lm_head projection — while smaller GEMMs happen to use cuBLAS code paths that are unaffected.

Fix Action

Workaround

unset LD_LIBRARY_PATH
vllm serve ByteDance-Seed/BAGEL-7B-MoT --port 8091 --quantization fp8

This allows the dynamic linker to find PyTorch's bundled cuBLAS (matching version) instead of the system one.

Code Example

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling
cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda,
b, CUDA_R_16BF, ldb, &fbeta, c, std::is_same_v<C_Dtype, float> ? CUDA_R_32F
: CUDA_R_16BF, ldc, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

---

ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/local/nvidia/lib64:${LD_LIBRARY_PATH}

---

unset LD_LIBRARY_PATH
vllm serve ByteDance-Seed/BAGEL-7B-MoT --port 8091 --quantization fp8

RAW_BUFFERClick to expand / collapse

Summary

Environment

Docker image: vllm/vllm-openai:v0.17.0
Model: ByteDance-Seed/BAGEL-7B-MoT (also reported with Qwen3.5-122B in #35608)
Command: vllm serve ByteDance-Seed/BAGEL-7B-MoT --port 8091 --quantization fp8

Error

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling
cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16BF, lda,
b, CUDA_R_16BF, ldb, &fbeta, c, std::is_same_v<C_Dtype, float> ? CUDA_R_32F
: CUDA_R_16BF, ldc, compute_type, CUBLAS_GEMM_DEFAULT_TENSOR_OP)

Root Cause

The vLLM v0.17.0 Dockerfile sets:

ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH=/usr/local/nvidia/lib64:${LD_LIBRARY_PATH}

Workaround

unset LD_LIBRARY_PATH
vllm serve ByteDance-Seed/BAGEL-7B-MoT --port 8091 --quantization fp8

This allows the dynamic linker to find PyTorch's bundled cuBLAS (matching version) instead of the system one.

Suggested Fix

Ensure the Dockerfile does not set LD_LIBRARY_PATH in a way that shadows PyTorch's bundled CUDA libraries, or sanitize it at container entrypoint time.

Related Issues

#35608 — Same error with Qwen3.5-122B, same unset LD_LIBRARY_PATH fix
#35028 — Same CUBLAS_STATUS_INVALID_VALUE error
pytorch/pytorch#174949 — cuBLAS version mismatch after PyTorch 2.10 upgrade

extent analysis

Fix Plan

To resolve the CUBLAS_STATUS_INVALID_VALUE error caused by a version mismatch between the system cuBLAS and PyTorch's bundled cuBLAS, follow these steps:

Modify the Dockerfile: Ensure that the LD_LIBRARY_PATH does not shadow PyTorch's bundled CUDA libraries. This can be achieved by removing or modifying the lines that set LD_LIBRARY_PATH to include system CUDA paths.
Sanitize LD_LIBRARY_PATH at container entrypoint time: Alternatively, you can unset LD_LIBRARY_PATH at the beginning of the container's entrypoint script to prevent it from interfering with PyTorch's bundled libraries.

Example Dockerfile modification:

# Remove these lines to prevent shadowing PyTorch's cuBLAS
# ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
# ENV LD_LIBRARY_PATH=/usr/local/nvidia/lib64:${LD_LIBRARY_PATH}

Alternatively, sanitize LD_LIBRARY_PATH in the entrypoint script:

#!/bin/bash
unset LD_LIBRARY_PATH
vllm serve ByteDance-Seed/BAGEL-7B-MoT --port 8091 --quantization fp8

Verification

To verify that the fix worked, run the vllm serve command with the --quantization fp8 option and check for the absence of the CUBLAS_STATUS_INVALID_VALUE error.

Extra Tips

Ensure that the PyTorch version is compatible with the CUDA version installed in the container.
Be cautious when setting LD_LIBRARY_PATH in Dockerfiles, as it can lead to version mismatches and other issues.
Consider using a more robust method to manage library paths, such as using ldconfig or LD_PRELOAD, to avoid similar issues in the future.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #memory leak #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: CUBLAS_STATUS_INVALID_VALUE in Docker due to LD_LIBRARY_PATH cuBLAS version conflict [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Code Example

Summary

Environment

Error

Root Cause

Workaround

Suggested Fix

Related Issues

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]: CUBLAS_STATUS_INVALID_VALUE in Docker due to LD_LIBRARY_PATH cuBLAS version conflict [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Code Example

Summary

Environment

Error

Root Cause

Workaround

Suggested Fix

Related Issues

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING