vllm - 💡(How to fix) Fix [Bug]: FlashInfer JIT compilation fails with "No such file or directory" in v0.20.1/v0.20.2 (docker)

Error Message

vllm-openai_1 | (EngineCore pid=41) fatal error: flashinfer/page.cuh: No such file or directory vllm-openai_1 | (EngineCore pid=41) 2 | #include <flashinfer/page.cuh> ... vllm-openai_1 | (EngineCore pid=41) fatal error: flashinfer/attention/mask.cuh: No such file or directory vllm-openai_1 | (EngineCore pid=41) 16 | #include <flashinfer/attention/mask.cuh> ... vllm-openai_1 | (EngineCore pid=41) fatal error: flashinfer/attention/prefill.cuh: No such file or directory vllm-openai_1 | (EngineCore pid=41) 1 | #include <flashinfer/attention/prefill.cuh>

Code Example

docker pull vllm/vllm-openai:v0.20.2

---

version: "3"
   services:
       vllm-openai:
           runtime: nvidia
           image: vllm/vllm-openai:v0.20.2
           command: >
               --model=/models/Qwen3.6-27B-AWQ
               --dtype=float16
               --tensor-parallel-size=2
               --kv-cache-dtype=fp8_e5m2
               --enable-chunked-prefill
               --enable-prefix-caching
               --max-model-len=262144
               --gpu-memory-utilization=0.92
           volumes:
               - /path/to/models:/models

---

vllm-openai_1  | (EngineCore pid=41) fatal error: flashinfer/page.cuh: No such file or directory
vllm-openai_1  | (EngineCore pid=41)     2 | #include <flashinfer/page.cuh>
...
vllm-openai_1  | (EngineCore pid=41) fatal error: flashinfer/attention/mask.cuh: No such file or directory
vllm-openai_1  | (EngineCore pid=41)    16 | #include <flashinfer/attention/mask.cuh>
...
vllm-openai_1  | (EngineCore pid=41) fatal error: flashinfer/attention/prefill.cuh: No such file or directory
vllm-openai_1  | (EngineCore pid=41)     1 | #include <flashinfer/attention/prefill.cuh>

Title: [Bug] Official Docker Image v0.20.1/v0.20.2 fails to start: FlashInfer headers missing in JIT compilation

Description:

The official Docker image vllm/vllm-openai:v0.20.2 (and likely v0.20.1) fails to start due to FlashInfer JIT compilation errors. The engine cannot find essential header files like flashinfer/page.cuh and flashinfer/attention/prefill.cuh.

This is a regression compared to vllm/vllm-openai:v0.20.0, which starts and runs successfully with the same configuration. Since I am using the official pre-built Docker image, this suggests a packaging or build issue in the newer images where FlashInfer C++ headers are either missing or not correctly exposed to the compiler.

Environment:

VLLM Version: 0.20.1 / 0.20.2
Hardware: NVIDIA GeForce RTX 2080 Ti (Compute Capability 7.5)
CUDA Version: 13.0 (Driver 580.95.05)
Docker Image: vllm/vllm-openai:v0.20.2
Note: The same configuration works fine with vllm/vllm-openai:v0.20.0.

Reproduction Steps:

Pull the official image:
```
docker pull vllm/vllm-openai:v0.20.2
```

Run the container with a model that triggers FlashInfer (e.g., Qwen, Llama) and specific KV cache settings:

version: "3"
services:
    vllm-openai:
        runtime: nvidia
        image: vllm/vllm-openai:v0.20.2
        command: >
            --model=/models/Qwen3.6-27B-AWQ
            --dtype=float16
            --tensor-parallel-size=2
            --kv-cache-dtype=fp8_e5m2
            --enable-chunked-prefill
            --enable-prefix-caching
            --max-model-len=262144
            --gpu-memory-utilization=0.92
        volumes:
            - /path/to/models:/models

Observe the logs. The container crashes during startup.

Error Log:

vllm-openai_1  | (EngineCore pid=41) fatal error: flashinfer/page.cuh: No such file or directory
vllm-openai_1  | (EngineCore pid=41)     2 | #include <flashinfer/page.cuh>
...
vllm-openai_1  | (EngineCore pid=41) fatal error: flashinfer/attention/mask.cuh: No such file or directory
vllm-openai_1  | (EngineCore pid=41)    16 | #include <flashinfer/attention/mask.cuh>
...
vllm-openai_1  | (EngineCore pid=41) fatal error: flashinfer/attention/prefill.cuh: No such file or directory
vllm-openai_1  | (EngineCore pid=41)     1 | #include <flashinfer/attention/prefill.cuh>

Analysis:

The compilation command in the log shows include paths like: -isystem /usr/local/lib/python3.12/dist-packages/flashinfer/data/include
However, inside the container, the directory /usr/local/lib/python3.12/dist-packages/flashinfer/data/include/flashinfer/ appears to be missing or empty of the required .cuh files.
This works fine in v0.20.0, indicating a change in the Dockerfile or the flashinfer-python wheel installed in the image between these versions.

Expected Behavior:

The official Docker image v0.20.2 should start successfully, just like v0.20.0, without manual intervention or header file fixes.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: FlashInfer JIT compilation fails with "No such file or directory" in v0.20.1/v0.20.2 (docker)

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]: FlashInfer JIT compilation fails with "No such file or directory" in v0.20.1/v0.20.2 (docker)

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Still need to ship something?

RELATED_DISCOVERY

TRENDING