vllm - ✅(Solved) Fix Regression in nightly: AttributeError 'MergedColumnParallelLinear' has no attribute 'weight' with Qwen3.5-9B [1 pull requests, 4 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#37444Fetched 2026-04-08 00:58:36
View on GitHub
Comments
4
Participants
4
Timeline
21
Reactions
0
Timeline (top)
subscribed ×7mentioned ×6commented ×4referenced ×2

The latest cu130-nightly build (v0.17.2rc1.dev49, image built 2026-03-18) crashes during model loading for cyankiwi/Qwen3.5-9B-AWQ-4bit (architecture: Qwen3_5ForConditionalGeneration). The previous nightly (v0.17.1rc1.dev177, ~2026-03-16) works correctly.

Error Message

File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 185, in forward
    self.in_proj_qkvz.weight.shape[0],

AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'
torch._dynamo.exc.ObservedAttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'

Root Cause

It appears that qwen3_5.py line 185 accesses self.in_proj_qkvz.weight.shape[0], but in_proj_qkvz is a MergedColumnParallelLinear which does not expose a .weight attribute directly. This worked in the previous nightly, so a recent change to either MergedColumnParallelLinear or qwen3_5.py likely introduced the regression.

Fix Action

Workaround

Pin to v0.17.1rc1.dev177 or use the stable v0.17.1-cu130 release (though the stable release has worse Mamba memory management, causing OOM at ~48k input tokens that the nightly handles fine up to 160k+).

PR fix notes

PR #37448: Fix AttributeError in Qwen3.5 GDN layers with quantized models

Description (problem / solution / changelog)

Summary

  • Replace self.in_proj_qkvz.weight.shape[0] and self.in_proj_ba.weight.shape[0] with sum(self.in_proj_qkvz.output_sizes) and sum(self.in_proj_ba.output_sizes) in both qwen3_5.py and qwen3_next.py
  • MergedColumnParallelLinear does not expose a .weight attribute when using quantization methods like compressed-tensors/AWQ, causing an AttributeError during the forward pass
  • The output_sizes attribute is always available on MergedColumnParallelLinear and provides the same total output dimension needed by the gdn_in_proj custom op for shape tracing

Motivation

This fixes a regression introduced in #36795 where the new gdn_in_proj custom op accesses self.in_proj_qkvz.weight.shape[0] and self.in_proj_ba.weight.shape[0]. With quantized models (e.g., cyankiwi/Qwen3.5-9B-AWQ-4bit using compressed-tensors), the MergedColumnParallelLinear layer does not have a .weight attribute — the weight is managed by the quantization kernel. This causes:

AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'

Fixes #37444

Test plan

  • Verify cyankiwi/Qwen3.5-9B-AWQ-4bit loads and runs inference without error
  • Verify non-quantized Qwen3.5 models still work (no regression from this change)

🤖 Generated with Claude Code

Changed files

  • vllm/model_executor/models/qwen3_5.py (modified, +2/-2)
  • vllm/model_executor/models/qwen3_next.py (modified, +2/-2)

Code Example

File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 185, in forward
    self.in_proj_qkvz.weight.shape[0],

AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'
torch._dynamo.exc.ObservedAttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'

---

docker run --gpus all vllm/vllm-openai:cu130-nightly \
  cyankiwi/Qwen3.5-9B-AWQ-4bit \
  --quantization compressed-tensors \
  --kv-cache-dtype fp8 \
  --trust-remote-code
RAW_BUFFERClick to expand / collapse

Bug Report

Description

The latest cu130-nightly build (v0.17.2rc1.dev49, image built 2026-03-18) crashes during model loading for cyankiwi/Qwen3.5-9B-AWQ-4bit (architecture: Qwen3_5ForConditionalGeneration). The previous nightly (v0.17.1rc1.dev177, ~2026-03-16) works correctly.

Error

File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen3_5.py", line 185, in forward
    self.in_proj_qkvz.weight.shape[0],

AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'
torch._dynamo.exc.ObservedAttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight'

Steps to Reproduce

docker run --gpus all vllm/vllm-openai:cu130-nightly \
  cyankiwi/Qwen3.5-9B-AWQ-4bit \
  --quantization compressed-tensors \
  --kv-cache-dtype fp8 \
  --trust-remote-code

Environment

  • GPU: NVIDIA RTX 5070 Ti (16 GiB, SM 120 / Blackwell)
  • Working image: vllm/vllm-openai:cu130-nightly built ~2026-03-16 (v0.17.1rc1.dev177+gd4c57863f)
  • Broken image: vllm/vllm-openai:cu130-nightly built 2026-03-18 (v0.17.2rc1.dev49+g8b6325758)
  • Model: cyankiwi/Qwen3.5-9B-AWQ-4bit (Qwen3_5ForConditionalGeneration, hybrid Mamba+Transformer)
  • Quantization: compressed-tensors (AWQ 4-bit, MarlinLinearKernel)

Analysis

It appears that qwen3_5.py line 185 accesses self.in_proj_qkvz.weight.shape[0], but in_proj_qkvz is a MergedColumnParallelLinear which does not expose a .weight attribute directly. This worked in the previous nightly, so a recent change to either MergedColumnParallelLinear or qwen3_5.py likely introduced the regression.

Workaround

Pin to v0.17.1rc1.dev177 or use the stable v0.17.1-cu130 release (though the stable release has worse Mamba memory management, causing OOM at ~48k input tokens that the nightly handles fine up to 160k+).

extent analysis

Fix Plan

To resolve the issue, we need to modify the qwen3_5.py file to access the weights of the MergedColumnParallelLinear object correctly.

  • Update the forward method in qwen3_5.py to access the weights using the correct attribute:

Replace line 185 in qwen3_5.py

self.in_proj_qkvz.weight.shape[0]

self.in_proj_qkvz.module.weight.shape[0]

    or
    ```python
# Alternatively, if MergedColumnParallelLinear has a weights attribute
self.in_proj_qkvz.weights.shape[0]
  • If the MergedColumnParallelLinear object does not expose a weights attribute, you may need to access the weights of the individual linear layers:

Assuming MergedColumnParallelLinear has a modules attribute

for module in self.in_proj_qkvz.modules(): if isinstance(module, torch.nn.Linear): weight_shape = module.weight.shape[0] # Use weight_shape as needed


### Verification
To verify the fix, run the following command:
```bash
docker run --gpus all vllm/vllm-openai:cu130-nightly \
  cyankiwi/Qwen3.5-9B-AWQ-4bit \
  --quantization compressed-tensors \
  --kv-cache-dtype fp8 \
  --trust-remote-code

If the model loads successfully without errors, the fix is working correctly.

Extra Tips

  • Make sure to test the updated code with different input sizes to ensure it works as expected.
  • If you're using a version control system, create a new branch for the fix and merge it into the main branch after verification.
  • Consider adding a test case to prevent similar regressions in the future.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING