vllm - 💡(How to fix) Fix [Bug]: FlashInfer JIT compilation fails with "No such file or directory" in v0.20.1/v0.20.2 (docker)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

vllm-openai_1 | (EngineCore pid=41) fatal error: flashinfer/page.cuh: No such file or directory vllm-openai_1 | (EngineCore pid=41) 2 | #include <flashinfer/page.cuh> ... vllm-openai_1 | (EngineCore pid=41) fatal error: flashinfer/attention/mask.cuh: No such file or directory vllm-openai_1 | (EngineCore pid=41) 16 | #include <flashinfer/attention/mask.cuh> ... vllm-openai_1 | (EngineCore pid=41) fatal error: flashinfer/attention/prefill.cuh: No such file or directory vllm-openai_1 | (EngineCore pid=41) 1 | #include <flashinfer/attention/prefill.cuh>

Code Example

docker pull vllm/vllm-openai:v0.20.2

---

version: "3"
   services:
       vllm-openai:
           runtime: nvidia
           image: vllm/vllm-openai:v0.20.2
           command: >
               --model=/models/Qwen3.6-27B-AWQ
               --dtype=float16
               --tensor-parallel-size=2
               --kv-cache-dtype=fp8_e5m2
               --enable-chunked-prefill
               --enable-prefix-caching
               --max-model-len=262144
               --gpu-memory-utilization=0.92
           volumes:
               - /path/to/models:/models

---

vllm-openai_1  | (EngineCore pid=41) fatal error: flashinfer/page.cuh: No such file or directory
vllm-openai_1  | (EngineCore pid=41)     2 | #include <flashinfer/page.cuh>
...
vllm-openai_1  | (EngineCore pid=41) fatal error: flashinfer/attention/mask.cuh: No such file or directory
vllm-openai_1  | (EngineCore pid=41)    16 | #include <flashinfer/attention/mask.cuh>
...
vllm-openai_1  | (EngineCore pid=41) fatal error: flashinfer/attention/prefill.cuh: No such file or directory
vllm-openai_1  | (EngineCore pid=41)     1 | #include <flashinfer/attention/prefill.cuh>
RAW_BUFFERClick to expand / collapse

Title: [Bug] Official Docker Image v0.20.1/v0.20.2 fails to start: FlashInfer headers missing in JIT compilation

Description:

The official Docker image vllm/vllm-openai:v0.20.2 (and likely v0.20.1) fails to start due to FlashInfer JIT compilation errors. The engine cannot find essential header files like flashinfer/page.cuh and flashinfer/attention/prefill.cuh.

This is a regression compared to vllm/vllm-openai:v0.20.0, which starts and runs successfully with the same configuration. Since I am using the official pre-built Docker image, this suggests a packaging or build issue in the newer images where FlashInfer C++ headers are either missing or not correctly exposed to the compiler.

Environment:

  • VLLM Version: 0.20.1 / 0.20.2
  • Hardware: NVIDIA GeForce RTX 2080 Ti (Compute Capability 7.5)
  • CUDA Version: 13.0 (Driver 580.95.05)
  • Docker Image: vllm/vllm-openai:v0.20.2
  • Note: The same configuration works fine with vllm/vllm-openai:v0.20.0.

Reproduction Steps:

  1. Pull the official image:

    docker pull vllm/vllm-openai:v0.20.2
  2. Run the container with a model that triggers FlashInfer (e.g., Qwen, Llama) and specific KV cache settings:

    version: "3"
    services:
        vllm-openai:
            runtime: nvidia
            image: vllm/vllm-openai:v0.20.2
            command: >
                --model=/models/Qwen3.6-27B-AWQ
                --dtype=float16
                --tensor-parallel-size=2
                --kv-cache-dtype=fp8_e5m2
                --enable-chunked-prefill
                --enable-prefix-caching
                --max-model-len=262144
                --gpu-memory-utilization=0.92
            volumes:
                - /path/to/models:/models
  3. Observe the logs. The container crashes during startup.

Error Log:

vllm-openai_1  | (EngineCore pid=41) fatal error: flashinfer/page.cuh: No such file or directory
vllm-openai_1  | (EngineCore pid=41)     2 | #include <flashinfer/page.cuh>
...
vllm-openai_1  | (EngineCore pid=41) fatal error: flashinfer/attention/mask.cuh: No such file or directory
vllm-openai_1  | (EngineCore pid=41)    16 | #include <flashinfer/attention/mask.cuh>
...
vllm-openai_1  | (EngineCore pid=41) fatal error: flashinfer/attention/prefill.cuh: No such file or directory
vllm-openai_1  | (EngineCore pid=41)     1 | #include <flashinfer/attention/prefill.cuh>

Analysis:

  • The compilation command in the log shows include paths like: -isystem /usr/local/lib/python3.12/dist-packages/flashinfer/data/include
  • However, inside the container, the directory /usr/local/lib/python3.12/dist-packages/flashinfer/data/include/flashinfer/ appears to be missing or empty of the required .cuh files.
  • This works fine in v0.20.0, indicating a change in the Dockerfile or the flashinfer-python wheel installed in the image between these versions.

Expected Behavior:

The official Docker image v0.20.2 should start successfully, just like v0.20.0, without manual intervention or header file fixes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: FlashInfer JIT compilation fails with "No such file or directory" in v0.20.1/v0.20.2 (docker)