vllm - ✅(Solved) Fix Regression in nightly: AttributeError 'MergedColumnParallelLinear' has no attribute 'weight' with Qwen3.5-9B [1 pull requests, 4 comments, 4 participants]

vllm2026-03-18 14:34:34

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#37444•Fetched 2026-04-08 00:58:36

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

subscribed ×7mentioned ×6commented ×4referenced ×2

The latest cu130-nightly build (v0.17.2rc1.dev49, image built 2026-03-18) crashes during model loading for cyankiwi/Qwen3.5-9B-AWQ-4bit (architecture: Qwen3_5ForConditionalGeneration). The previous nightly (v0.17.1rc1.dev177, ~2026-03-16) works correctly.

Error Message

File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 185, in forward
    self.in_proj_qkvz.weight.shape[0],

AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'
torch._dynamo.exc.ObservedAttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'

Root Cause

It appears that qwen3_5.py line 185 accesses self.in_proj_qkvz.weight.shape[0], but in_proj_qkvz is a MergedColumnParallelLinear which does not expose a .weight attribute directly. This worked in the previous nightly, so a recent change to either MergedColumnParallelLinear or qwen3_5.py likely introduced the regression.

Fix Action

Workaround

Pin to v0.17.1rc1.dev177 or use the stable v0.17.1-cu130 release (though the stable release has worse Mamba memory management, causing OOM at ~48k input tokens that the nightly handles fine up to 160k+).

PR fix notes

PR #37448: Fix AttributeError in Qwen3.5 GDN layers with quantized models

Repository: vllm-project/vllm
Author: jhsmith409
State: closed | merged: True
Link: https://github.com/vllm-project/vllm/pull/37448

Description (problem / solution / changelog)

Summary

Replace self.in_proj_qkvz.weight.shape[0] and self.in_proj_ba.weight.shape[0] with sum(self.in_proj_qkvz.output_sizes) and sum(self.in_proj_ba.output_sizes) in both qwen3_5.py and qwen3_next.py
MergedColumnParallelLinear does not expose a .weight attribute when using quantization methods like compressed-tensors/AWQ, causing an AttributeError during the forward pass
The output_sizes attribute is always available on MergedColumnParallelLinear and provides the same total output dimension needed by the gdn_in_proj custom op for shape tracing

Motivation

This fixes a regression introduced in #36795 where the new gdn_in_proj custom op accesses self.in_proj_qkvz.weight.shape[0] and self.in_proj_ba.weight.shape[0]. With quantized models (e.g., cyankiwi/Qwen3.5-9B-AWQ-4bit using compressed-tensors), the MergedColumnParallelLinear layer does not have a .weight attribute — the weight is managed by the quantization kernel. This causes:

AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'

Fixes #37444

Test plan

Verify cyankiwi/Qwen3.5-9B-AWQ-4bit loads and runs inference without error
Verify non-quantized Qwen3.5 models still work (no regression from this change)

🤖 Generated with Claude Code

Changed files

vllm/model_executor/models/qwen3_5.py (modified, +2/-2)
vllm/model_executor/models/qwen3_next.py (modified, +2/-2)

Code Example

File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 185, in forward
    self.in_proj_qkvz.weight.shape[0],

AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'
torch._dynamo.exc.ObservedAttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'

---

docker run --gpus all vllm/vllm-openai:cu130-nightly \
  cyankiwi/Qwen3.5-9B-AWQ-4bit \
  --quantization compressed-tensors \
  --kv-cache-dtype fp8 \
  --trust-remote-code

RAW_BUFFERClick to expand / collapse

Bug Report

Description

Error

File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 185, in forward
    self.in_proj_qkvz.weight.shape[0],

AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'
torch._dynamo.exc.ObservedAttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'

Steps to Reproduce

docker run --gpus all vllm/vllm-openai:cu130-nightly \
  cyankiwi/Qwen3.5-9B-AWQ-4bit \
  --quantization compressed-tensors \
  --kv-cache-dtype fp8 \
  --trust-remote-code

Environment

GPU: NVIDIA RTX 5070 Ti (16 GiB, SM 120 / Blackwell)
Working image: vllm/vllm-openai:cu130-nightly built ~2026-03-16 (v0.17.1rc1.dev177+gd4c57863f)
Broken image: vllm/vllm-openai:cu130-nightly built 2026-03-18 (v0.17.2rc1.dev49+g8b6325758)
Model: cyankiwi/Qwen3.5-9B-AWQ-4bit (Qwen3_5ForConditionalGeneration, hybrid Mamba+Transformer)
Quantization: compressed-tensors (AWQ 4-bit, MarlinLinearKernel)

Analysis

Workaround

extent analysis

Fix Plan

To resolve the issue, we need to modify the qwen3_5.py file to access the weights of the MergedColumnParallelLinear object correctly.

Update the forward method in qwen3_5.py to access the weights using the correct attribute:

Replace line 185 in qwen3_5.py

self.in_proj_qkvz.weight.shape[0]

self.in_proj_qkvz.module.weight.shape[0]

    or
    ```python
# Alternatively, if MergedColumnParallelLinear has a weights attribute
self.in_proj_qkvz.weights.shape[0]

If the MergedColumnParallelLinear object does not expose a weights attribute, you may need to access the weights of the individual linear layers:

Assuming MergedColumnParallelLinear has a modules attribute

for module in self.in_proj_qkvz.modules(): if isinstance(module, torch.nn.Linear): weight_shape = module.weight.shape[0] # Use weight_shape as needed


### Verification
To verify the fix, run the following command:
```bash
docker run --gpus all vllm/vllm-openai:cu130-nightly \
  cyankiwi/Qwen3.5-9B-AWQ-4bit \
  --quantization compressed-tensors \
  --kv-cache-dtype fp8 \
  --trust-remote-code

If the model loads successfully without errors, the fix is working correctly.

Extra Tips

Make sure to test the updated code with different input sizes to ensure it works as expected.
If you're using a version control system, create a new branch for the fix and merge it into the main branch after verification.
Consider adding a test case to prevent similar regressions in the future.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #memory management #generation error #database connection #vector store #model loading

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

vllm - ✅(Solved) Fix Regression in nightly: AttributeError 'MergedColumnParallelLinear' has no attribute 'weight' with Qwen3.5-9B [1 pull requests, 4 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

PR fix notes

PR #37448: Fix AttributeError in Qwen3.5 GDN layers with quantized models

Description (problem / solution / changelog)

Summary

Motivation

Test plan

Changed files

Code Example

Bug Report

Description

Error

Steps to Reproduce

Environment

Analysis

Workaround

extent analysis

Fix Plan

Replace line 185 in qwen3_5.py

self.in_proj_qkvz.weight.shape[0]

Assuming MergedColumnParallelLinear has a modules attribute

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING