vllm - 💡(How to fix) Fix [Bug]: Intel ARC 140v not supported as XE2 cutlass kernel [18 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#37828Fetched 2026-04-08 01:17:52
View on GitHub
Comments
18
Participants
2
Timeline
24
Reactions
0
Timeline (top)
commented ×18labeled ×2mentioned ×2subscribed ×2

Error Message

(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] EngineCore failed to start. (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] Traceback (most recent call last): (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/v1/engine/core.py", line 1082, in run_engine_core (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/v1/engine/core.py", line 848, in init (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] super().init( (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/v1/engine/core.py", line 124, in init (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] kv_cache_config = self._initialize_kv_caches(vllm_config) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/v1/engine/core.py", line 247, in _initialize_kv_caches (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] available_gpu_memory = self.model_executor.determine_available_memory() (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/v1/executor/abstract.py", line 136, in determine_available_memory (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return self.collective_rpc("determine_available_memory") (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/v1/executor/uniproc_executor.py", line 80, in collective_rpc (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] result = run_method(self.driver_worker, method, args, kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/v1/serial_utils.py", line 510, in run_method (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return func(*args, **kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] self.model_runner.profile_run() (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/v1/worker/gpu_model_runner.py", line 5520, in profile_run (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] dummy_encoder_outputs = self.model.embed_multimodal( (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/model_executor/models/gemma3_mm.py", line 587, in embed_multimodal (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return self._process_image_input(image_input) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/model_executor/models/gemma3_mm.py", line 574, in _process_image_input (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] image_features = self._image_pixels_to_features( (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/model_executor/models/gemma3_mm.py", line 565, in _image_pixels_to_features (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return vision_tower(pixel_values) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return self._call_impl(*args, **kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return forward_call(*args, **kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/model_executor/models/siglip.py", line 893, in forward (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return self.vision_model( (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return self._call_impl(*args, **kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return forward_call(*args, **kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/model_executor/models/siglip.py", line 773, in forward (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] encoder_outputs = self.encoder( (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return self._call_impl(*args, **kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return forward_call(*args, **kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/model_executor/models/siglip.py", line 564, in forward (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] hidden_states, _ = encoder_layer(hidden_states) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return self._call_impl(*args, **kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return forward_call(*args, **kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/model_executor/models/siglip.py", line 513, in forward (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] hidden_states, _ = self.self_attn(hidden_states=hidden_states) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return self._call_impl(*args, **kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return forward_call(*args, **kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/model_executor/models/siglip.py", line 428, in forward (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] out = self.attn(query_states, key_states, value_states) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return self._call_impl(*args, **kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return forward_call(*args, **kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/model_executor/custom_op.py", line 136, in forward (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return self._forward_method(*args, **kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/model_executor/layers/attention/mm_encoder_attention.py", line 440, in forward_xpu (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return self._forward_fa(query, key, value, cu_seqlens, max_seqlen) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/model_executor/layers/attention/mm_encoder_attention.py", line 308, in _forward_fa (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] output = vit_flash_attn_wrapper( (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/v1/attention/ops/vit_attn_wrappers.py", line 100, in vit_flash_attn_wrapper (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return torch.ops.vllm.flash_attn_maxseqlen_wrapper( (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1209, in call (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return self._op(*args, **kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/v1/attention/ops/vit_attn_wrappers.py", line 51, in flash_attn_maxseqlen_wrapper (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] output = flash_attn_varlen_func( (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/src/vllm/vllm/_xpu_ops.py", line 200, in flash_attn_varlen_func (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return flash_attn_varlen_func( (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/.venv/lib/python3.12/site-packages/vllm_xpu_kernels/flash_attn_interface.py", line 125, in flash_attn_varlen_func (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] out, softmax_lse = torch.ops._vllm_fa2_C.varlen_fwd( (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] File "/home/pteros/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1209, in call (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] return self._op(*args, **kwargs) (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] RuntimeError: Only XE2 cutlass kernel is supported currently.

Fix Action

Fix / Workaround

============================== CPU Info

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 42 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) Ultra 7 268V CPU family: 6 Model: 189 Thread(s) per core: 1 Core(s) per socket: 8 Socket(s): 1 Stepping: 1 BogoMIPS: 6604.80 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni vnmi umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities Virtualization: VT-x Hypervisor vendor: Microsoft Virtualization type: full L1d cache: 384 KiB (8 instances) L1i cache: 512 KiB (8 instances) L2 cache: 20 MiB (8 instances) L3 cache: 12 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-7 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Mitigation; Enhanced IBRS Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected

Code Example

Collecting environment information...
uv is set
==============================
        System Info
==============================
OS                           : Ubuntu 24.04.4 LTS (x86_64)
GCC version                  : (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Clang version                : Could not collect
CMake version                : version 4.2.3
Libc version                 : glibc-2.39

==============================
       PyTorch Info
==============================
PyTorch version              : 2.10.0+xpu
Is debug build               : False
CUDA used to build PyTorch   : None
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.13 (main, Mar 10 2026, 18:17:25) [Clang 21.1.4 ] (64-bit runtime)
Python platform              : Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.39

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : False
CUDA runtime version         : No CUDA
CUDA_MODULE_LOADING set to   : N/A
GPU models and configuration : No CUDA
Nvidia driver version        : No CUDA
cuDNN version                : No CUDA
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        42 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               8
On-line CPU(s) list:                  0-7
Vendor ID:                            GenuineIntel
Model name:                           Intel(R) Core(TM) Ultra 7 268V
CPU family:                           6
Model:                                189
Thread(s) per core:                   1
Core(s) per socket:                   8
Socket(s):                            1
Stepping:                             1
BogoMIPS:                             6604.80
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni vnmi umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
Virtualization:                       VT-x
Hypervisor vendor:                    Microsoft
Virtualization type:                  full
L1d cache:                            384 KiB (8 instances)
L1i cache:                            512 KiB (8 instances)
L2 cache:                             20 MiB (8 instances)
L3 cache:                             12 MiB (1 instance)
NUMA node(s):                         1
NUMA node0 CPU(s):                    0-7
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

==============================
Versions of relevant libraries
==============================
[pip3] numpy==2.2.6
[pip3] pyzmq==27.1.0
[pip3] torch==2.10.0+xpu
[pip3] torchaudio==2.10.0+xpu
[pip3] torchvision==0.25.0+xpu
[pip3] transformers==4.57.6
[pip3] triton-xpu==3.6.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.18.1rc1.dev27+g63f49b8bd.d20260322 (git sha: 63f49b8bd, date: 20260322)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
  Could not collect

==============================
     Environment Variables
==============================
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_pteros

---

(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] EngineCore failed to start.
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] Traceback (most recent call last):
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/engine/core.py", line 1082, in run_engine_core
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/engine/core.py", line 848, in __init__
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     super().__init__(
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/engine/core.py", line 124, in __init__
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     kv_cache_config = self._initialize_kv_caches(vllm_config)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/engine/core.py", line 247, in _initialize_kv_caches
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/executor/abstract.py", line 136, in determine_available_memory
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self.collective_rpc("determine_available_memory")
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/executor/uniproc_executor.py", line 80, in collective_rpc
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/serial_utils.py", line 510, in run_method
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     self.model_runner.profile_run()
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/worker/gpu_model_runner.py", line 5520, in profile_run
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     dummy_encoder_outputs = self.model.embed_multimodal(
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/models/gemma3_mm.py", line 587, in embed_multimodal
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._process_image_input(image_input)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/models/gemma3_mm.py", line 574, in _process_image_input
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     image_features = self._image_pixels_to_features(
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/models/gemma3_mm.py", line 565, in _image_pixels_to_features
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return vision_tower(pixel_values)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._call_impl(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return forward_call(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/models/siglip.py", line 893, in forward
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self.vision_model(
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._call_impl(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return forward_call(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/models/siglip.py", line 773, in forward
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     encoder_outputs = self.encoder(
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]                       ^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._call_impl(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return forward_call(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/models/siglip.py", line 564, in forward
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     hidden_states, _ = encoder_layer(hidden_states)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._call_impl(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return forward_call(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/models/siglip.py", line 513, in forward
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     hidden_states, _ = self.self_attn(hidden_states=hidden_states)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._call_impl(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return forward_call(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/models/siglip.py", line 428, in forward
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     out = self.attn(query_states, key_states, value_states)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._call_impl(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return forward_call(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/custom_op.py", line 136, in forward
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._forward_method(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/layers/attention/mm_encoder_attention.py", line 440, in forward_xpu
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._forward_fa(query, key, value, cu_seqlens, max_seqlen)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/layers/attention/mm_encoder_attention.py", line 308, in _forward_fa
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     output = vit_flash_attn_wrapper(
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]              ^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/attention/ops/vit_attn_wrappers.py", line 100, in vit_flash_attn_wrapper
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return torch.ops.vllm.flash_attn_maxseqlen_wrapper(
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1209, in __call__
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._op(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/attention/ops/vit_attn_wrappers.py", line 51, in flash_attn_maxseqlen_wrapper
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     output = flash_attn_varlen_func(
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]              ^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/_xpu_ops.py", line 200, in flash_attn_varlen_func
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return flash_attn_varlen_func(
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/vllm_xpu_kernels/flash_attn_interface.py", line 125, in flash_attn_varlen_func
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     out, softmax_lse = torch.ops._vllm_fa2_C.varlen_fwd(
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1209, in __call__
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._op(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] RuntimeError: Only XE2 cutlass kernel is supported currently.

---

sudo apt-get update && sudo apt-get upgrade -y

sudo add-apt-repository -y ppa:kobuk-team/intel-graphics
sudo apt update
sudo apt-get install -y libze-intel-gpu1 libze1 intel-metrics-discovery intel-opencl-icd clinfo intel-gsc intel-media-va-driver-non-free libmfx-gen1 libvpl2 libvpl-tools libva-glx2 va-driver-all vainfo libze-dev intel-ocloc

wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt update
sudo apt install -y intel-deep-learning-essentials cmake pkg-config build-essential

curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
uv python install 3.12
uv venv --python 3.12
source .venv/bin/activate

mkdir src && cd src && git clone https://github.com/vllm-project/vllm.git && cd vllm
uv pip install --upgrade pip
uv pip install -v -r requirements/xpu.txt --index-strategy unsafe-best-match
uv pip uninstall triton triton-xpu
uv pip install triton-xpu==3.6.0 --extra-index-url https://download.pytorch.org/whl/xpu
VLLM_TARGET_DEVICE=xpu uv pip install --no-build-isolation -e . -v
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
Collecting environment information...
uv is set
==============================
        System Info
==============================
OS                           : Ubuntu 24.04.4 LTS (x86_64)
GCC version                  : (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Clang version                : Could not collect
CMake version                : version 4.2.3
Libc version                 : glibc-2.39

==============================
       PyTorch Info
==============================
PyTorch version              : 2.10.0+xpu
Is debug build               : False
CUDA used to build PyTorch   : None
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.13 (main, Mar 10 2026, 18:17:25) [Clang 21.1.4 ] (64-bit runtime)
Python platform              : Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.39

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : False
CUDA runtime version         : No CUDA
CUDA_MODULE_LOADING set to   : N/A
GPU models and configuration : No CUDA
Nvidia driver version        : No CUDA
cuDNN version                : No CUDA
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        42 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               8
On-line CPU(s) list:                  0-7
Vendor ID:                            GenuineIntel
Model name:                           Intel(R) Core(TM) Ultra 7 268V
CPU family:                           6
Model:                                189
Thread(s) per core:                   1
Core(s) per socket:                   8
Socket(s):                            1
Stepping:                             1
BogoMIPS:                             6604.80
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni vnmi umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
Virtualization:                       VT-x
Hypervisor vendor:                    Microsoft
Virtualization type:                  full
L1d cache:                            384 KiB (8 instances)
L1i cache:                            512 KiB (8 instances)
L2 cache:                             20 MiB (8 instances)
L3 cache:                             12 MiB (1 instance)
NUMA node(s):                         1
NUMA node0 CPU(s):                    0-7
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

==============================
Versions of relevant libraries
==============================
[pip3] numpy==2.2.6
[pip3] pyzmq==27.1.0
[pip3] torch==2.10.0+xpu
[pip3] torchaudio==2.10.0+xpu
[pip3] torchvision==0.25.0+xpu
[pip3] transformers==4.57.6
[pip3] triton-xpu==3.6.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.18.1rc1.dev27+g63f49b8bd.d20260322 (git sha: 63f49b8bd, date: 20260322)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
  Could not collect

==============================
     Environment Variables
==============================
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_pteros
</details>

🐛 Describe the bug

I followed the official document, installed and ran vLLM on Ubuntu 24.04 via WSL2, with Intel arc 140v GPU integrated to Intel core ultra 7 268v. I encountered the error not recognizing Intel arc 140v as XE2 cutclass.

It's worth mentioning that before "Deprecate ipex and switch to vllm-xpu-kernels for xpu platform" in Feb 2026, I was able to follow the tutorial here https://www.rogerngo.com/blog/accessible-ai-vllm-on-intel-arc to install and run vLLM. However, today I cannot find an old commit and follow the same steps to make it work.

Error

(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] EngineCore failed to start.
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] Traceback (most recent call last):
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/engine/core.py", line 1082, in run_engine_core
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/engine/core.py", line 848, in __init__
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     super().__init__(
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/engine/core.py", line 124, in __init__
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     kv_cache_config = self._initialize_kv_caches(vllm_config)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/engine/core.py", line 247, in _initialize_kv_caches
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/executor/abstract.py", line 136, in determine_available_memory
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self.collective_rpc("determine_available_memory")
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/executor/uniproc_executor.py", line 80, in collective_rpc
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/serial_utils.py", line 510, in run_method
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return func(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/worker/gpu_worker.py", line 370, in determine_available_memory
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     self.model_runner.profile_run()
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/worker/gpu_model_runner.py", line 5520, in profile_run
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     dummy_encoder_outputs = self.model.embed_multimodal(
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/models/gemma3_mm.py", line 587, in embed_multimodal
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._process_image_input(image_input)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/models/gemma3_mm.py", line 574, in _process_image_input
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     image_features = self._image_pixels_to_features(
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/models/gemma3_mm.py", line 565, in _image_pixels_to_features
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return vision_tower(pixel_values)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._call_impl(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return forward_call(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/models/siglip.py", line 893, in forward
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self.vision_model(
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._call_impl(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return forward_call(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/models/siglip.py", line 773, in forward
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     encoder_outputs = self.encoder(
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]                       ^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._call_impl(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return forward_call(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/models/siglip.py", line 564, in forward
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     hidden_states, _ = encoder_layer(hidden_states)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._call_impl(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return forward_call(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/models/siglip.py", line 513, in forward
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     hidden_states, _ = self.self_attn(hidden_states=hidden_states)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._call_impl(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return forward_call(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/models/siglip.py", line 428, in forward
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     out = self.attn(query_states, key_states, value_states)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._call_impl(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return forward_call(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/custom_op.py", line 136, in forward
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._forward_method(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/layers/attention/mm_encoder_attention.py", line 440, in forward_xpu
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._forward_fa(query, key, value, cu_seqlens, max_seqlen)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/model_executor/layers/attention/mm_encoder_attention.py", line 308, in _forward_fa
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     output = vit_flash_attn_wrapper(
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]              ^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/attention/ops/vit_attn_wrappers.py", line 100, in vit_flash_attn_wrapper
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return torch.ops.vllm.flash_attn_maxseqlen_wrapper(
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1209, in __call__
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._op(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/v1/attention/ops/vit_attn_wrappers.py", line 51, in flash_attn_maxseqlen_wrapper
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     output = flash_attn_varlen_func(
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]              ^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/src/vllm/vllm/_xpu_ops.py", line 200, in flash_attn_varlen_func
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return flash_attn_varlen_func(
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/vllm_xpu_kernels/flash_attn_interface.py", line 125, in flash_attn_varlen_func
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     out, softmax_lse = torch.ops._vllm_fa2_C.varlen_fwd(
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]   File "/home/pteros/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1209, in __call__
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]     return self._op(*args, **kwargs)
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=6844) ERROR 03-22 22:03:08 [core.py:1108] RuntimeError: Only XE2 cutlass kernel is supported currently.

The installation steps

sudo apt-get update && sudo apt-get upgrade -y

sudo add-apt-repository -y ppa:kobuk-team/intel-graphics
sudo apt update
sudo apt-get install -y libze-intel-gpu1 libze1 intel-metrics-discovery intel-opencl-icd clinfo intel-gsc intel-media-va-driver-non-free libmfx-gen1 libvpl2 libvpl-tools libva-glx2 va-driver-all vainfo libze-dev intel-ocloc

wget -O- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | gpg --dearmor | sudo tee /usr/share/keyrings/oneapi-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt update
sudo apt install -y intel-deep-learning-essentials cmake pkg-config build-essential

curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
uv python install 3.12
uv venv --python 3.12
source .venv/bin/activate

mkdir src && cd src && git clone https://github.com/vllm-project/vllm.git && cd vllm
uv pip install --upgrade pip
uv pip install -v -r requirements/xpu.txt --index-strategy unsafe-best-match
uv pip uninstall triton triton-xpu
uv pip install triton-xpu==3.6.0 --extra-index-url https://download.pytorch.org/whl/xpu
VLLM_TARGET_DEVICE=xpu uv pip install --no-build-isolation -e . -v

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To fix the issue of vLLM not recognizing Intel Arc 140V as XE2 cutclass, follow these steps:

  • Update vLLM repository: Run git pull in the vLLM repository directory to ensure you have the latest code.
  • Install XE2 cutlass kernel: The error message indicates that only XE2 cutlass kernel is supported. You need to install the XE2 cutlass kernel for Intel Arc 140V.
  • Modify vllm_xpu_kernels: You may need to modify the vllm_xpu_kernels package to support XE2 cutlass kernel for Intel Arc 140V.

Here's an example code snippet to modify vllm_xpu_kernels:

# in vllm_xpu_kernels/flash_attn_interface.py
def flash_attn_varlen_func(...):
    # Add support for XE2 cutlass kernel
    if torch.ops._vllm_fa2_C.varlen_fwd.supports_xe2_cutlass():
        return torch.ops._vllm_fa2_C.varlen_fwd_xe2_cutlass(...)
    else:
        return torch.ops._vllm_fa2_C.varlen_fwd(...)
  • Reinstall vLLM: After modifying vllm_xpu_kernels, reinstall vLLM using VLLM_TARGET_DEVICE=xpu uv pip install --no-build-isolation -e . -v

Verification

To verify that the fix worked, run the vLLM model with Intel Arc 140V and check if it recognizes the device as XE2 cutclass.

Extra Tips

  • Make sure to update your Intel graphics drivers to the latest version.
  • If you encounter any issues during the installation process, refer to the vLLM documentation and GitHub issues for troubleshooting guides.
  • Consider reaching out to the vLLM community or Intel support for further assistance if the issue persists.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING