vllm - 💡(How to fix) Fix MLA: kv_b_proj.weight.dtype AttributeError on quantized ColumnParallelLinear in chunked prefill

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When using a quantized (AWQ/GPTQ/compressed-tensors) model with MLA attention, vLLM crashes with an AttributeError during chunked prefill because kv_b_proj is a ColumnParallelLinear that lacks a .weight attribute after quantization.

Error Message

AttributeError: 'ColumnParallelLinear' object has no attribute 'weight'

at vllm/model_executor/layers/attention/mla_attention.py:2094 in _compute_prefill_context:

kv_c_normed = kv_c_normed.to(self.kv_b_proj.weight.dtype)

Root Cause

Lines 2084-2087 already correctly handle quantized layers:

_kv_b_proj_w_dtype = (
    self.kv_b_proj.weight.dtype
    if hasattr(self.kv_b_proj, "weight")
    else self.kv_b_proj.params_dtype
)

But line 2094 ignores _kv_b_proj_w_dtype and directly accesses self.kv_b_proj.weight.dtype without the hasattr guard.

Fix Action

Fix

On line 2094, replace:

kv_c_normed = kv_c_normed.to(self.kv_b_proj.weight.dtype)

with:

kv_c_normed = kv_c_normed.to(_kv_b_proj_w_dtype)

The variable _kv_b_proj_w_dtype is already computed with the correct hasattr guard immediately above.

Code Example

AttributeError: 'ColumnParallelLinear' object has no attribute 'weight'

---

kv_c_normed = kv_c_normed.to(self.kv_b_proj.weight.dtype)

---

_kv_b_proj_w_dtype = (
    self.kv_b_proj.weight.dtype
    if hasattr(self.kv_b_proj, "weight")
    else self.kv_b_proj.params_dtype
)

---

kv_c_normed = kv_c_normed.to(self.kv_b_proj.weight.dtype)

---

kv_c_normed = kv_c_normed.to(_kv_b_proj_w_dtype)

---

File "vllm/model_executor/layers/attention/mla_attention.py", line 2094, in _compute_prefill_context
    kv_c_normed = kv_c_normed.to(self.kv_b_proj.weight.dtype)
                                  ^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'ColumnParallelLinear' object has no attribute 'weight'
RAW_BUFFERClick to expand / collapse

Description

When using a quantized (AWQ/GPTQ/compressed-tensors) model with MLA attention, vLLM crashes with an AttributeError during chunked prefill because kv_b_proj is a ColumnParallelLinear that lacks a .weight attribute after quantization.

Error

AttributeError: 'ColumnParallelLinear' object has no attribute 'weight'

at vllm/model_executor/layers/attention/mla_attention.py:2094 in _compute_prefill_context:

kv_c_normed = kv_c_normed.to(self.kv_b_proj.weight.dtype)

Root Cause

Lines 2084-2087 already correctly handle quantized layers:

_kv_b_proj_w_dtype = (
    self.kv_b_proj.weight.dtype
    if hasattr(self.kv_b_proj, "weight")
    else self.kv_b_proj.params_dtype
)

But line 2094 ignores _kv_b_proj_w_dtype and directly accesses self.kv_b_proj.weight.dtype without the hasattr guard.

Fix

On line 2094, replace:

kv_c_normed = kv_c_normed.to(self.kv_b_proj.weight.dtype)

with:

kv_c_normed = kv_c_normed.to(_kv_b_proj_w_dtype)

The variable _kv_b_proj_w_dtype is already computed with the correct hasattr guard immediately above.

Impact

vLLM EngineCore crashes with a fatal error, forcing a full restart (service enters crash loop under load). All inflight requests fail with HTTP 500.

Environment

  • vLLM version: v0.21.1rc1.dev384 (nightly, also present in current main)
  • Model: GLM-4.7-Flash-AWQ-4bit (quantized with compressed-tensors)
  • GPU: NVIDIA RTX 3090 (CUDA 12.9)
  • Quantization: compressed-tensors (AWQ group_size=32, num_bits=4)
  • CUDA graphs: enabled
  • MLA: enabled (model uses Multi-head Latent Attention)

Stack Trace

File "vllm/model_executor/layers/attention/mla_attention.py", line 2094, in _compute_prefill_context
    kv_c_normed = kv_c_normed.to(self.kv_b_proj.weight.dtype)
                                  ^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'ColumnParallelLinear' object has no attribute 'weight'

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING