vllm - ✅(Solved) Fix [Bug]: LoRA on Qwen-3.5-2B fails to run [3 pull requests, 15 comments, 9 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#36478Fetched 2026-04-08 00:36:42
View on GitHub
Comments
15
Participants
9
Timeline
36
Reactions
4
Author
Timeline (top)
commented ×15subscribed ×9cross-referenced ×5mentioned ×5

Error Message

(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100] self.lora_manager.set_active_adapters(lora_requests, lora_mapping) (EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100] File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 166, in set_active_adapters (EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100] self._apply_adapters(requests) (EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100] File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 253, in _apply_adapters (EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100] self.add_adapter(lora) (EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100] File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 288, in add_adapter (EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100] self._adapter_manager.activate_adapter(lora_request.lora_int_id) (EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100] File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 848, in activate_adapter (EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100] result = super().activate_adapter(lora_id) (EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100] File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 258, in activate_adapter (EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100] module.set_lora( (EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100] File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/lora/layers/column_parallel_linear.py", line 268, in set_lora (EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100] if (lora_a_i := lora_a[i]) is not None: (EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100] ~~~~~~^^^ (EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100] IndexError: list index out of range

Fix Action

Fixed

PR fix notes

PR #36603: fix(lora): fix IndexError and GQA tensor size mismatch in QKV LoRA la…

Description (problem / solution / changelog)

Summary

Fixes two related LoRA bugs that together prevent serving LoRA adapters on GQA models like Qwen3.5-2B (any model where num_q_heads != num_kv_heads).

Bug 1: IndexError in set_lora / slice_lora_b

MergedColumnParallelLinearWithLoRA.set_lora used range(self.n_slices) but lora_a/lora_b can have fewer elements in certain TP configurations, causing IndexError: list index out of range.

slice_lora_b had the same issue with direct lora_b[i] indexing.

Fix: Replace range(n_slices) loop with enumerate(zip(..., strict=False)) and add i < len(lora_b) guard in slice_lora_b.

Bug 2: Tensor size mismatch for GQA models

MergedQKVParallelLinearWithLoRA.create_lora_weights delegated entirely to super(), which allocates lora_b_stacked using a uniform size for all 3 slices (Q, K, V). For GQA models the Q output dim ≠ KV output dim:

  • Qwen3.5-2B: Q = 32 × 128 = 4096, K = V = 8 × 128 = 1024

This caused set_lora to fail with:

ValueError: The size of tensor a (2048) must match the size of tensor b (6144)
  at non-singleton dimension 0

Fix: Explicitly allocate lora_b_stacked[i] using self.output_slices (q_proj_shard_size, kv_proj_shard_size, kv_proj_shard_size) which is already correctly computed in __init__ and accounts for GQA asymmetry.

Testing

New test file tests/lora/test_layers_issue_36478.py:

  • TestBugA_BoundsChecking — covers the IndexError scenarios
  • TestBugB_GQAShapeMismatch — parametrized for Qwen3.5-2B, Llama3.1-8B, Mistral-7B, and MHA, plus a partial-LoRA (Q-only) test combining both bugs

Related issues

Fixes #36478 Related to #36372 (Bug 1 was also partially addressed by #36395)

Changed files

  • tests/lora/test_layers_issue_36478.py (added, +297/-0)
  • vllm/lora/layers/column_parallel_linear.py (modified, +59/-5)

PR #36652: Fix Qwen3.5 LoRA packed module mapping

Description (problem / solution / changelog)

Purpose

Fix Qwen3.5 LoRA initialization for the fused GatedDeltaNet in_proj_qkvz projection.

Qwen3_5GatedDeltaNet.create_qkvz_proj() builds a fused MergedColumnParallelLinear with 4 slices, but packed_modules_mapping["in_proj_qkvz"] only declared 2 packed entries (["in_proj_qkv", "in_proj_z"]). During LoRA warmup / adapter activation, vLLM uses this mapping to build packed LoRA tensors, and the shorter mapping causes an IndexError: list index out of range in vllm/lora/layers/column_parallel_linear.py.

This PR updates the packed mapping for Qwen3.5 so it matches the actual fused layout:

  • slices 0, 1, 2 map to in_proj_qkv
  • slice 3 maps to in_proj_z

This is a targeted fix for the crash reported in:

  • #36478
  • #36372

Test Plan

  1. Reproduce the issue with a Qwen3.5 model + LoRA adapter. Before this change, engine initialization fails during LoRA warmup / CUDA graph capture with:

    IndexError: list index out of range
    ...
    vllm/lora/layers/column_parallel_linear.py", line 268, in set_lora
    if (lora_a_i := lora_a[i]) is not None:
  2. Apply this patch and rerun the same Qwen3.5 + LoRA initialization path.

  3. Sanity check the edited module:

    python -m compileall vllm/model_executor/models/qwen3_5.py

Test Result

  • python -m compileall vllm/model_executor/models/qwen3_5.py passes.

  • Static validation:

    • create_qkvz_proj() creates a 4-slice fused projection.
    • the Qwen3.5 weight loader already treats slices 0,1,2 as in_proj_qkv and slice 3 as in_proj_z.
    • after this change, packed_modules_mapping["in_proj_qkvz"] matches that 4-slice layout.
  • Runtime behavior before this change:

    • Qwen3.5 + LoRA could fail during engine startup with the IndexError above.
  • Runtime behavior after this change:

    • [replace this line with your rerun result, e.g. “engine initializes successfully and no longer crashes during LoRA warmup”]

Notes

  • No documentation update needed.
  • No release note update planned; this is a targeted bug fix.

Changed files

  • vllm/model_executor/models/qwen3_5.py (modified, +14/-2)

PR #36825: [Bugfix] Fix Qwen3.5 LoRA IndexError in packed_modules_mapping

Description (problem / solution / changelog)

Summary

Fixes IndexError: list index out of range when enabling LoRA with Qwen3.5 models (Qwen3_5ForCausalLMBase and Qwen3_5ForConditionalGeneration).

Root cause: Qwen3.5's create_qkvz_proj overrides the parent (Qwen3Next) to use 4 output_sizes [key_dim, key_dim, value_dim, value_dim] for correct per-slice TP sharding. However, packed_modules_mapping only lists 2 entries ["in_proj_qkv", "in_proj_z"]. During LoRA initialization, MergedColumnParallelLinearWithLoRA sets n_slices = len(output_sizes) (4) but only creates len(packed_modules) (2) adapters, so accessing lora_a[2]/lora_a[3] crashes.

Fix:

  1. Expand packed_modules_mapping for in_proj_qkvz from 2 to 4 entries: ["in_proj_q", "in_proj_k", "in_proj_v", "in_proj_z"] — matching the 4 output_sizes
  2. Generalize MergedColumnParallelLinearWithLoRA.can_replace_layer from len(packed_modules_list) == 2 to len(packed_modules_list) == len(source_layer.output_sizes) — so it works for any N-way merged column parallel linear, not just 2-way

This works for any TP size because each of the 4 packed modules maps to one output_size, preserving correct per-slice sharding.

Note: The parent class Qwen3Next doesn't have this issue because it uses output_sizes=[sum(key_dim, key_dim, value_dim, value_dim)] (1 entry) with packed_modules=["in_proj_qkvz"] (1 entry) — they match.

Note: This may not be the globally optimal solution. The 4 packed module names (in_proj_q, in_proj_k, in_proj_v, in_proj_z) are synthetic — the actual HF weight names are in_proj_qkv (fused Q+K+V) and in_proj_z. This means LoRA adapter weights targeting the GDN projections by their real HF names wouldn't be found during loading. In practice this isn't an issue today because nobody LoRAs the GDN layers — only standard attention and MLP layers are targeted. A more complete fix would be to support M packed modules mapping to N output sizes (2 weights → 4 sharding slices) in MergedColumnParallelLinearWithLoRA, but that's a larger refactor.

Related: #36372, #36478

Test plan

  • Verified LoRA training (TP=1) completes successfully with Qwen3.5-9B on 2x RTX PRO 6000 Blackwell GPUs using prime-rl
  • Test with TP=2 and TP=4

Changed files

  • vllm/lora/layers/column_parallel_linear.py (modified, +76/-9)

Code Example

CUDA 12.3
PYTHON 3.12.0 

Package                                  Version
---------------------------------------- -------------
aiohappyeyeballs                         2.6.1
aiohttp                                  3.13.3
aiosignal                                1.4.0
annotated-doc                            0.0.4
annotated-types                          0.7.0
anthropic                                0.84.0
anyio                                    4.12.1
apache-tvm-ffi                           0.1.9
astor                                    0.8.1
attrs                                    25.4.0
blake3                                   1.0.8
cachetools                               7.0.4
cbor2                                    5.8.0
certifi                                  2026.2.25
cffi                                     2.0.0
charset-normalizer                       3.4.5
click                                    8.3.1
cloudpickle                              3.1.2
compressed-tensors                       0.13.0
cryptography                             46.0.5
cuda-bindings                            12.9.4
cuda-pathfinder                          1.4.1
cuda-python                              12.9.4
cupy-cuda12x                             14.0.1
depyf                                    0.20.0
dill                                     0.4.1
diskcache                                5.6.3
distro                                   1.9.0
dnspython                                2.8.0
docstring_parser                         0.17.0
einops                                   0.8.2
email-validator                          2.3.0
fastapi                                  0.135.1
fastapi-cli                              0.0.24
fastapi-cloud-cli                        0.14.1
fastar                                   0.8.0
filelock                                 3.25.0
flashinfer-python                        0.6.4
frozenlist                               1.8.0
fsspec                                   2026.2.0
gguf                                     0.18.0
googleapis-common-protos                 1.73.0
grpcio                                   1.78.0
grpcio-reflection                        1.78.0
h11                                      0.16.0
hf-xet                                   1.3.2
httpcore                                 1.0.9
httptools                                0.7.1
httpx                                    0.28.1
httpx-sse                                0.4.3
huggingface_hub                          0.36.2
idna                                     3.11
ijson                                    3.5.0
importlib_metadata                       8.7.1
interegular                              0.3.3
Jinja2                                   3.1.6
jiter                                    0.13.0
jmespath                                 1.1.0
json_repair                              0.58.5
jsonschema                               4.26.0
jsonschema-specifications                2025.9.1
kaldi-native-fbank                       1.22.3
lark                                     1.2.2
llguidance                               1.3.0
llvmlite                                 0.44.0
lm-format-enforcer                       0.11.3
loguru                                   0.7.3
markdown-it-py                           4.0.0
MarkupSafe                               3.0.3
mcp                                      1.26.0
mdurl                                    0.1.2
mistral_common                           1.9.1
model-hosting-container-standards        0.1.13
mpmath                                   1.3.0
msgpack                                  1.1.2
msgspec                                  0.20.0
multidict                                6.7.1
networkx                                 3.6.1
ninja                                    1.13.0
numba                                    0.61.2
numpy                                    2.2.6
nvidia-cublas-cu12                       12.8.4.1
nvidia-cuda-cupti-cu12                   12.8.90
nvidia-cuda-nvrtc-cu12                   12.8.93
nvidia-cuda-runtime-cu12                 12.8.90
nvidia-cudnn-cu12                        9.10.2.21
nvidia-cudnn-frontend                    1.18.0
nvidia-cufft-cu12                        11.3.3.83
nvidia-cufile-cu12                       1.13.1.3
nvidia-curand-cu12                       10.3.9.90
nvidia-cusolver-cu12                     11.7.3.90
nvidia-cusparse-cu12                     12.5.8.93
nvidia-cusparselt-cu12                   0.7.1
nvidia-cutlass-dsl                       4.4.1
nvidia-cutlass-dsl-libs-base             4.4.1
nvidia-ml-py                             13.590.48
nvidia-nccl-cu12                         2.27.5
nvidia-nvjitlink-cu12                    12.8.93
nvidia-nvshmem-cu12                      3.4.5
nvidia-nvtx-cu12                         12.8.90
openai                                   2.24.0
openai-harmony                           0.0.8
opencv-python-headless                   4.13.0.92
opentelemetry-api                        1.40.0
opentelemetry-exporter-otlp              1.40.0
opentelemetry-exporter-otlp-proto-common 1.40.0
opentelemetry-exporter-otlp-proto-grpc   1.40.0
opentelemetry-exporter-otlp-proto-http   1.40.0
opentelemetry-proto                      1.40.0
opentelemetry-sdk                        1.40.0
opentelemetry-semantic-conventions       0.61b0
opentelemetry-semantic-conventions-ai    0.4.15
outlines_core                            0.2.11
packaging                                25.0
partial-json-parser                      0.2.1.1.post7
pillow                                   12.1.1
pip                                      26.0.1
prometheus_client                        0.24.1
prometheus-fastapi-instrumentator        7.1.0
propcache                                0.4.1
protobuf                                 6.33.5
psutil                                   7.2.2
py-cpuinfo                               9.0.0
pybase64                                 1.4.3
pycountry                                26.2.16
pycparser                                3.0
pydantic                                 2.12.5
pydantic_core                            2.41.5
pydantic-extra-types                     2.11.0
pydantic-settings                        2.13.1
Pygments                                 2.19.2
PyJWT                                    2.11.0
python-dotenv                            1.2.2
python-json-logger                       4.0.0
python-multipart                         0.0.22
PyYAML                                   6.0.3
pyzmq                                    27.1.0
quack-kernels                            0.2.10
ray                                      2.54.0
referencing                              0.37.0
regex                                    2026.2.28
requests                                 2.32.5
rich                                     14.3.3
rich-toolkit                             0.19.7
rignore                                  0.7.6
rpds-py                                  0.30.0
safetensors                              0.7.0
sentencepiece                            0.2.1
sentry-sdk                               2.54.0
setproctitle                             1.3.7
setuptools                               80.10.2
shellingham                              1.5.4
six                                      1.17.0
sniffio                                  1.3.1
sse-starlette                            3.3.2
starlette                                0.52.1
supervisor                               4.3.0
sympy                                    1.14.0
tabulate                                 0.10.0
tiktoken                                 0.12.0
tokenizers                               0.22.2
torch                                    2.10.0
torch_c_dlpack_ext                       0.1.5
torchaudio                               2.10.0
torchvision                              0.25.0
tqdm                                     4.67.3
transformers                             4.57.6
triton                                   3.6.0
typer                                    0.24.1
typing_extensions                        4.15.0
typing-inspection                        0.4.2
urllib3                                  2.6.3
uvicorn                                  0.41.0
uvloop                                   0.22.1
vllm                                     0.17.0
watchfiles                               1.1.1
websockets                               16.0
wheel                                    0.46.3
xgrammar                                 0.1.29
yarl                                     1.23.0
zipp                                     3.23.0

---

export CUDA_VISIBLE_DEVICES="4"
vllm serve /opt/nas/n/model/Qwen3.5-2B \
--gpu-memory-utilization 0.6 \
--host 0.0.0.0  \
--port 6688 \
--tensor-parallel-size 1 \
--max_model_len 10240 \
--allowed_local_media_path /opt/nas/n  \
--enable-log-requests \
--enable-lora \
--lora-modules M1=/opt/nas/n/ms-swift/output/lora/2B/checkpoint-1640

---

(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]     self.lora_manager.set_active_adapters(lora_requests, lora_mapping)
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]   File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 166, in set_active_adapters
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]     self._apply_adapters(requests)
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]   File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 253, in _apply_adapters
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]     self.add_adapter(lora)
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]   File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 288, in add_adapter
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]     self._adapter_manager.activate_adapter(lora_request.lora_int_id)
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]   File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 848, in activate_adapter
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]     result = super().activate_adapter(lora_id)
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]   File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 258, in activate_adapter
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]     module.set_lora(
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]   File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/lora/layers/column_parallel_linear.py", line 268, in set_lora
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]     if (lora_a_i := lora_a[i]) is not None:
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]                     ~~~~~~^^^
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100] IndexError: list index out of range

---

Then I got an another error:
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>Env</code></summary>

CUDA 12.3
PYTHON 3.12.0 

Package                                  Version
---------------------------------------- -------------
aiohappyeyeballs                         2.6.1
aiohttp                                  3.13.3
aiosignal                                1.4.0
annotated-doc                            0.0.4
annotated-types                          0.7.0
anthropic                                0.84.0
anyio                                    4.12.1
apache-tvm-ffi                           0.1.9
astor                                    0.8.1
attrs                                    25.4.0
blake3                                   1.0.8
cachetools                               7.0.4
cbor2                                    5.8.0
certifi                                  2026.2.25
cffi                                     2.0.0
charset-normalizer                       3.4.5
click                                    8.3.1
cloudpickle                              3.1.2
compressed-tensors                       0.13.0
cryptography                             46.0.5
cuda-bindings                            12.9.4
cuda-pathfinder                          1.4.1
cuda-python                              12.9.4
cupy-cuda12x                             14.0.1
depyf                                    0.20.0
dill                                     0.4.1
diskcache                                5.6.3
distro                                   1.9.0
dnspython                                2.8.0
docstring_parser                         0.17.0
einops                                   0.8.2
email-validator                          2.3.0
fastapi                                  0.135.1
fastapi-cli                              0.0.24
fastapi-cloud-cli                        0.14.1
fastar                                   0.8.0
filelock                                 3.25.0
flashinfer-python                        0.6.4
frozenlist                               1.8.0
fsspec                                   2026.2.0
gguf                                     0.18.0
googleapis-common-protos                 1.73.0
grpcio                                   1.78.0
grpcio-reflection                        1.78.0
h11                                      0.16.0
hf-xet                                   1.3.2
httpcore                                 1.0.9
httptools                                0.7.1
httpx                                    0.28.1
httpx-sse                                0.4.3
huggingface_hub                          0.36.2
idna                                     3.11
ijson                                    3.5.0
importlib_metadata                       8.7.1
interegular                              0.3.3
Jinja2                                   3.1.6
jiter                                    0.13.0
jmespath                                 1.1.0
json_repair                              0.58.5
jsonschema                               4.26.0
jsonschema-specifications                2025.9.1
kaldi-native-fbank                       1.22.3
lark                                     1.2.2
llguidance                               1.3.0
llvmlite                                 0.44.0
lm-format-enforcer                       0.11.3
loguru                                   0.7.3
markdown-it-py                           4.0.0
MarkupSafe                               3.0.3
mcp                                      1.26.0
mdurl                                    0.1.2
mistral_common                           1.9.1
model-hosting-container-standards        0.1.13
mpmath                                   1.3.0
msgpack                                  1.1.2
msgspec                                  0.20.0
multidict                                6.7.1
networkx                                 3.6.1
ninja                                    1.13.0
numba                                    0.61.2
numpy                                    2.2.6
nvidia-cublas-cu12                       12.8.4.1
nvidia-cuda-cupti-cu12                   12.8.90
nvidia-cuda-nvrtc-cu12                   12.8.93
nvidia-cuda-runtime-cu12                 12.8.90
nvidia-cudnn-cu12                        9.10.2.21
nvidia-cudnn-frontend                    1.18.0
nvidia-cufft-cu12                        11.3.3.83
nvidia-cufile-cu12                       1.13.1.3
nvidia-curand-cu12                       10.3.9.90
nvidia-cusolver-cu12                     11.7.3.90
nvidia-cusparse-cu12                     12.5.8.93
nvidia-cusparselt-cu12                   0.7.1
nvidia-cutlass-dsl                       4.4.1
nvidia-cutlass-dsl-libs-base             4.4.1
nvidia-ml-py                             13.590.48
nvidia-nccl-cu12                         2.27.5
nvidia-nvjitlink-cu12                    12.8.93
nvidia-nvshmem-cu12                      3.4.5
nvidia-nvtx-cu12                         12.8.90
openai                                   2.24.0
openai-harmony                           0.0.8
opencv-python-headless                   4.13.0.92
opentelemetry-api                        1.40.0
opentelemetry-exporter-otlp              1.40.0
opentelemetry-exporter-otlp-proto-common 1.40.0
opentelemetry-exporter-otlp-proto-grpc   1.40.0
opentelemetry-exporter-otlp-proto-http   1.40.0
opentelemetry-proto                      1.40.0
opentelemetry-sdk                        1.40.0
opentelemetry-semantic-conventions       0.61b0
opentelemetry-semantic-conventions-ai    0.4.15
outlines_core                            0.2.11
packaging                                25.0
partial-json-parser                      0.2.1.1.post7
pillow                                   12.1.1
pip                                      26.0.1
prometheus_client                        0.24.1
prometheus-fastapi-instrumentator        7.1.0
propcache                                0.4.1
protobuf                                 6.33.5
psutil                                   7.2.2
py-cpuinfo                               9.0.0
pybase64                                 1.4.3
pycountry                                26.2.16
pycparser                                3.0
pydantic                                 2.12.5
pydantic_core                            2.41.5
pydantic-extra-types                     2.11.0
pydantic-settings                        2.13.1
Pygments                                 2.19.2
PyJWT                                    2.11.0
python-dotenv                            1.2.2
python-json-logger                       4.0.0
python-multipart                         0.0.22
PyYAML                                   6.0.3
pyzmq                                    27.1.0
quack-kernels                            0.2.10
ray                                      2.54.0
referencing                              0.37.0
regex                                    2026.2.28
requests                                 2.32.5
rich                                     14.3.3
rich-toolkit                             0.19.7
rignore                                  0.7.6
rpds-py                                  0.30.0
safetensors                              0.7.0
sentencepiece                            0.2.1
sentry-sdk                               2.54.0
setproctitle                             1.3.7
setuptools                               80.10.2
shellingham                              1.5.4
six                                      1.17.0
sniffio                                  1.3.1
sse-starlette                            3.3.2
starlette                                0.52.1
supervisor                               4.3.0
sympy                                    1.14.0
tabulate                                 0.10.0
tiktoken                                 0.12.0
tokenizers                               0.22.2
torch                                    2.10.0
torch_c_dlpack_ext                       0.1.5
torchaudio                               2.10.0
torchvision                              0.25.0
tqdm                                     4.67.3
transformers                             4.57.6
triton                                   3.6.0
typer                                    0.24.1
typing_extensions                        4.15.0
typing-inspection                        0.4.2
urllib3                                  2.6.3
uvicorn                                  0.41.0
uvloop                                   0.22.1
vllm                                     0.17.0
watchfiles                               1.1.1
websockets                               16.0
wheel                                    0.46.3
xgrammar                                 0.1.29
yarl                                     1.23.0
zipp                                     3.23.0
</details>

🐛 Describe the bug

shell script:

export CUDA_VISIBLE_DEVICES="4"
vllm serve /opt/nas/n/model/Qwen3.5-2B \
--gpu-memory-utilization 0.6 \
--host 0.0.0.0  \
--port 6688 \
--tensor-parallel-size 1 \
--max_model_len 10240 \
--allowed_local_media_path /opt/nas/n  \
--enable-log-requests \
--enable-lora \
--lora-modules M1=/opt/nas/n/ms-swift/output/lora/2B/checkpoint-1640

Logs:

(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]     self.lora_manager.set_active_adapters(lora_requests, lora_mapping)
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]   File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 166, in set_active_adapters
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]     self._apply_adapters(requests)
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]   File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 253, in _apply_adapters
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]     self.add_adapter(lora)
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]   File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/lora/worker_manager.py", line 288, in add_adapter
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]     self._adapter_manager.activate_adapter(lora_request.lora_int_id)
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]   File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 848, in activate_adapter
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]     result = super().activate_adapter(lora_id)
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]   File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/lora/model_manager.py", line 258, in activate_adapter
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]     module.set_lora(
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]   File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/lora/layers/column_parallel_linear.py", line 268, in set_lora
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]     if (lora_a_i := lora_a[i]) is not None:
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100]                     ~~~~~~^^^
(EngineCore_DP0 pid=2550193) ERROR 03-09 16:48:25 [core.py:1100] IndexError: list index out of range

I tried to update the source code manually based on
https://github.com/vllm-project/vllm/pull/36395/files from issue: https://github.com/vllm-project/vllm/issues/36372 Then I got an another error:

(APIServer pid=2571253)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=2571253)   File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=2571253)     return await main
(APIServer pid=2571253)            ^^^^^^^^^^
(APIServer pid=2571253)   File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server
(APIServer pid=2571253)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=2571253)   File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 498, in run_server_worker
(APIServer pid=2571253)     await init_app_state(engine_client, app.state, args, supported_tasks)
(APIServer pid=2571253)   File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 341, in init_app_state
(APIServer pid=2571253)     await state.openai_serving_models.init_static_loras()
(APIServer pid=2571253)   File "/opt/nas/p/conda/envs/xzx_vllm_qwen3_5/lib/python3.12/site-packages/vllm/entrypoints/openai/models/serving.py", line 80, in init_static_loras
(APIServer pid=2571253)     raise ValueError(load_result.error.message)
(APIServer pid=2571253) ValueError: Call to add_lora method failed: The size of tensor a (2048) must match the size of tensor b (6144) at non-singleton dimension 0

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To resolve the issue, we need to address two separate errors:

  1. IndexError: list index out of range: This error occurs because the lora_a list has fewer elements than expected. We need to ensure that the lora_a list is properly initialized and has the correct number of elements.
  2. ValueError: The size of tensor a (2048) must match the size of tensor b (6144) at non-singleton dimension 0: This error occurs because the sizes of two tensors do not match. We need to verify that the tensor sizes are correct and consistent.

Here are the steps to fix the issues:

  • Update the column_parallel_linear.py file to handle the IndexError:
# In column_parallel_linear.py
def set_lora(self, lora_a, lora_b):
    # ...
    if lora_a is not None and len(lora_a) > 0:
        for i in range(len(lora_a)):
            if lora_a[i] is not None:
                # ...
    # ...
  • Verify that the tensor sizes are correct and consistent:
# In serving.py
def init_static_loras(self):
    # ...
    lora_a = torch.randn(6144, ...)  # Ensure the size of tensor a matches tensor b
    lora_b = torch.randn(6144, ...)  # Ensure the size of tensor b matches tensor a
    # ...
    self.add_lora(lora_a, lora_b)
    # ...

Verification

To verify that the fixes work, restart the vllm service and check the logs for any errors. You can also test the add_lora method with sample tensors to ensure that it works correctly.

Extra Tips

  • Make sure to update the vllm package to the latest version to ensure that you have the latest fixes and features.
  • If you encounter any further issues, check the vllm documentation and GitHub issues for solutions or workarounds.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING