vllm - 💡(How to fix) Fix [Bug]: AttributeError: '_C' object has no attribute 'awq_dequantize' on Intel Arc B580 XPU (AWQ inference fails) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#41469Fetched 2026-05-02 05:27:59
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
labeled ×1

Error Message

ERROR [core.py:1136] File "/home/_/vllm/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1379, in getattr ERROR [core.py:1136] raise AttributeError ERROR [core.py:1136] AttributeError: '_OpNamespace' '_C' object has no attribute 'awq_dequantize'

Fix Action

Fix / Workaround

============================== CPU Info

Архитектура: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Порядок байт: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 ID прроизводителя: GenuineIntel Имя модели: Intel(R) Core(TM) i5-14490F Семейство ЦПУ: 6 Модель: 191 Thread(s) per core: 2 Ядер на сокет: 10 Сокетов: 1 Степпинг: 2 CPU(s) scaling MHz: 42% CPU max MHz: 4900,0000 CPU min MHz: 800,0000 BogoMIPS: 5608,00 Флаги: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect user_shstk avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize arch_lbr ibt flush_l1d arch_capabilities Виртуализация: VT-x L1d cache: 416 KiB (10 instances) L1i cache: 448 KiB (10 instances) L2 cache: 9,5 MiB (7 instances) L3 cache: 24 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Ghostwrite: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Old microcode: Not affected Vulnerability Reg file data sampling: Mitigation; Clear Register File Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Mitigation; IBPB before exit to userspace

Code Example

Collecting environment information...
==============================
        System Info
==============================
OS                           : Gentoo Linux (x86_64)
GCC version                  : (Gentoo 15.2.1_p20260214 p5) 15.2.1 20260214
Clang version                : 21.1.8
CMake version                : version 4.3.2
Libc version                 : glibc-2.42

==============================
       PyTorch Info
==============================
PyTorch version              : 2.11.0+xpu
Is debug build               : False
CUDA used to build PyTorch   : None
ROCM used to build PyTorch   : N/A
XPU used to build PyTorch    : 20250302

==============================
      Python Environment
==============================
Python version               : 3.12.13 (main, Mar 20 2026, 00:33:26) [Clang 22.1.1 ] (64-bit runtime)
Python platform              : Linux-6.18.25-gentoo-dist-x86_64-Intel-R-_Core-TM-_i5-14490F-with-glibc2.42
    
==============================
      Intel XPU / GPU Info
==============================
Is XPU available             : True
XPU runtime version          : 20250302
Intel GPU models             : GPU 0: Intel(R) Arc(TM) B580 Graphics

--Compile time--
oneAPI compiler version      : Could not collect
SYCL compiler build          : Could not collect
oneCCL version               : Could not collect

--Runtime--
Intel Graphics Compiler (IGC): Could not collect
Intel GMM (libigdgmm)        : Could not collect
Level Zero loader version    : Could not collect
Level Zero driver version    : Could not collect
vLLM XPU kernels version     : 0.1.7

==============================
          CPU Info
==============================
Архитектура:                             x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           39 bits physical, 48 bits virtual
Порядок байт:                            Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
ID прроизводителя:                       GenuineIntel
Имя модели:                              Intel(R) Core(TM) i5-14490F
Семейство ЦПУ:                           6
Модель:                                  191
Thread(s) per core:                      2
Ядер на сокет:                           10
Сокетов:                                 1
Степпинг:                                2
CPU(s) scaling MHz:                      42%
CPU max MHz:                             4900,0000
CPU min MHz:                             800,0000
BogoMIPS:                                5608,00
Флаги:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect user_shstk avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize arch_lbr ibt flush_l1d arch_capabilities
Виртуализация:                           VT-x
L1d cache:                               416 KiB (10 instances)
L1i cache:                               448 KiB (10 instances)
L2 cache:                                9,5 MiB (7 instances)
L3 cache:                                24 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Ghostwrite:                Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Old microcode:             Not affected
Vulnerability Reg file data sampling:    Mitigation; Clear Register File
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; Enhanced / Automatic IBRS; IBPB conditional; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Mitigation; IBPB before exit to userspace

==============================
Versions of relevant libraries
==============================
[pip3] numpy==2.3.5
[pip3] pyzmq==27.1.0
[pip3] torch==2.11.0+xpu
[pip3] torchaudio==2.11.0+xpu
[pip3] torchvision==0.26.0+xpu
[pip3] transformers==5.7.0
[pip3] triton-xpu==3.7.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.20.1rc1.dev131+g7075df79b (git sha: 7075df79b)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled; XPU: Enabled
GPU Topology:
  Could not collect

==============================
     Environment Variables
==============================
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_gagarinten

---

vllm serve Chunity/gemma-4-E4B-it-AWQ-4bit \
                  --max-model-len 4k \
                  --enable-auto-tool-choice \
                  --tool-call-parser gemma4 \
                  --port 8000 \
        --enforce-eager \
         --attention-backend TRITON_ATTN

---

ERROR [core.py:1136]   File "/home/_/vllm/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1379, in __getattr__
ERROR [core.py:1136]     raise AttributeError
ERROR [core.py:1136] AttributeError: '_OpNamespace' '_C' object has no attribute 'awq_dequantize'
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
Collecting environment information...
==============================
        System Info
==============================
OS                           : Gentoo Linux (x86_64)
GCC version                  : (Gentoo 15.2.1_p20260214 p5) 15.2.1 20260214
Clang version                : 21.1.8
CMake version                : version 4.3.2
Libc version                 : glibc-2.42

==============================
       PyTorch Info
==============================
PyTorch version              : 2.11.0+xpu
Is debug build               : False
CUDA used to build PyTorch   : None
ROCM used to build PyTorch   : N/A
XPU used to build PyTorch    : 20250302

==============================
      Python Environment
==============================
Python version               : 3.12.13 (main, Mar 20 2026, 00:33:26) [Clang 22.1.1 ] (64-bit runtime)
Python platform              : Linux-6.18.25-gentoo-dist-x86_64-Intel-R-_Core-TM-_i5-14490F-with-glibc2.42
    
==============================
      Intel XPU / GPU Info
==============================
Is XPU available             : True
XPU runtime version          : 20250302
Intel GPU models             : GPU 0: Intel(R) Arc(TM) B580 Graphics

--Compile time--
oneAPI compiler version      : Could not collect
SYCL compiler build          : Could not collect
oneCCL version               : Could not collect

--Runtime--
Intel Graphics Compiler (IGC): Could not collect
Intel GMM (libigdgmm)        : Could not collect
Level Zero loader version    : Could not collect
Level Zero driver version    : Could not collect
vLLM XPU kernels version     : 0.1.7

==============================
          CPU Info
==============================
Архитектура:                             x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           39 bits physical, 48 bits virtual
Порядок байт:                            Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
ID прроизводителя:                       GenuineIntel
Имя модели:                              Intel(R) Core(TM) i5-14490F
Семейство ЦПУ:                           6
Модель:                                  191
Thread(s) per core:                      2
Ядер на сокет:                           10
Сокетов:                                 1
Степпинг:                                2
CPU(s) scaling MHz:                      42%
CPU max MHz:                             4900,0000
CPU min MHz:                             800,0000
BogoMIPS:                                5608,00
Флаги:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect user_shstk avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize arch_lbr ibt flush_l1d arch_capabilities
Виртуализация:                           VT-x
L1d cache:                               416 KiB (10 instances)
L1i cache:                               448 KiB (10 instances)
L2 cache:                                9,5 MiB (7 instances)
L3 cache:                                24 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Ghostwrite:                Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Old microcode:             Not affected
Vulnerability Reg file data sampling:    Mitigation; Clear Register File
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; Enhanced / Automatic IBRS; IBPB conditional; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Mitigation; IBPB before exit to userspace

==============================
Versions of relevant libraries
==============================
[pip3] numpy==2.3.5
[pip3] pyzmq==27.1.0
[pip3] torch==2.11.0+xpu
[pip3] torchaudio==2.11.0+xpu
[pip3] torchvision==0.26.0+xpu
[pip3] transformers==5.7.0
[pip3] triton-xpu==3.7.0
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.20.1rc1.dev131+g7075df79b (git sha: 7075df79b)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled; XPU: Enabled
GPU Topology:
  Could not collect

==============================
     Environment Variables
==============================
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_gagarinten
</details>

🐛 Describe the bug

I am trying to run an AWQ-quantized model (Chunity/gemma-4-E4B-it-AWQ-4bit) on an Intel Arc B580 GPU using the vLLM XPU backend.

While standard unquantized models (like opt-125m) run fine, attempting to serve the AWQ model crashes with an AttributeError, indicating that the awq_dequantize C++ kernel is missing or not compiled for the Intel XPU backend.

Reproduction:

vllm serve Chunity/gemma-4-E4B-it-AWQ-4bit \
                  --max-model-len 4k \
                  --enable-auto-tool-choice \
                  --tool-call-parser gemma4 \
                  --port 8000 \
        --enforce-eager \
         --attention-backend TRITON_ATTN

Error log:

ERROR [core.py:1136]   File "/home/_/vllm/.venv/lib/python3.12/site-packages/torch/_ops.py", line 1379, in __getattr__
ERROR [core.py:1136]     raise AttributeError
ERROR [core.py:1136] AttributeError: '_OpNamespace' '_C' object has no attribute 'awq_dequantize'

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The awq_dequantize C++ kernel is missing or not compiled for the Intel XPU backend, causing an AttributeError when running an AWQ-quantized model.

Guidance

  • Verify that the vLLM XPU backend is properly installed and configured to support AWQ quantization.
  • Check the compilation flags used to build the vLLM XPU backend to ensure that AWQ quantization is enabled.
  • Investigate whether the awq_dequantize kernel is available in the vLLM XPU backend and if it's compatible with the Intel Arc B580 GPU.
  • Consider updating the vLLM XPU backend to the latest version or checking for any known issues related to AWQ quantization support.

Example

No code snippet is provided as the issue seems to be related to the configuration and compilation of the vLLM XPU backend rather than a specific code error.

Notes

The error log indicates a missing awq_dequantize kernel, which suggests a potential issue with the vLLM XPU backend configuration or compilation. Further investigation is needed to determine the root cause of the problem.

Recommendation

Apply a workaround by checking the vLLM XPU backend documentation for any specific requirements or limitations related to AWQ quantization support on Intel Arc B580 GPUs, and consider updating the backend to the latest version if necessary.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING