vllm - ✅(Solved) Fix [Bug]: Qwen3.5 LoRA module is not in model's supported LoRA target modules [1 pull requests, 5 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38085Fetched 2026-04-08 01:26:43
View on GitHub
Comments
5
Participants
4
Timeline
7
Reactions
3
Timeline (top)
commented ×5closed ×1labeled ×1

PR fix notes

PR #39369: fix: add LoRA prefix mapping for text-only adapters in Qwen3.5-VL

Description (problem / solution / changelog)

Summary

When a LoRA adapter is trained on a Qwen3.5 model using AutoModelForCausalLM (standard PEFT practice for text-only fine-tuning), vLLM's dynamic LoRA loading silently fails — all adapter weights are loaded into memory but zero modules are activated. The model runs on pure base weights with no error or warning at default log levels.

This also applies to Qwen3VLForConditionalGeneration models.

Note: --language-model-only does NOT work around this bug — it still loads Qwen3_5ForConditionalGeneration with the same language_model.model.layers.* prefix, same broken mapper.

Root Cause

Prefix mismatch between LoRA adapter weight keys and vLLM's internal module paths:

  • Training side: AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-4B") loads Qwen3_5ForCausalLM, producing adapter keys like base_model.model.model.layers.0.self_attn.q_proj.lora_A.weight
  • vLLM side: loads Qwen3_5ForConditionalGeneration (inherits from Qwen3VLForConditionalGeneration), so internal module paths are language_model.model.layers.0.self_attn.q_proj

After parse_fine_tuned_lora_name strips the base_model.model. prefix, the LoRA module name becomes model.layers.0.self_attn.q_proj. The hf_to_vllm_mapper only has:

hf_to_vllm_mapper = WeightsMapper(
    orig_to_new_prefix={
        "model.visual.": "visual.",
        "lm_head.": "language_model.lm_head.",
        "model.language_model.": "language_model.model.",
    }
)

model.layers.* does NOT match model.language_model.*, so the mapper passes it through unchanged. All lookups in _create_merged_loras_inplace fail silently.

End-to-End Reproduction

Tested on Qwen3.5-4B with vLLM v0.19.0:

# 1. Create LoRA adapter (standard PEFT workflow)
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model
import torch

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-4B", dtype=torch.bfloat16, device_map="cpu")
peft_model = get_peft_model(model, LoraConfig(r=8, lora_alpha=16, target_modules=["q_proj", "v_proj"], task_type="CAUSAL_LM"))
# Perturb lora_B so output would differ from base if LoRA is active
for name, param in peft_model.named_parameters():
    if "lora_B" in name:
        param.data = torch.randn_like(param.data) * 0.1
peft_model.save_pretrained("./test_lora")
AutoTokenizer.from_pretrained("Qwen/Qwen3.5-4B").save_pretrained("./test_lora")
# 2. Test with vLLM (must be in if __name__ == '__main__' for spawn)
from vllm import LLM, SamplingParams
from vllm.lora.request import LoRARequest

llm = LLM(model="Qwen/Qwen3.5-4B", enable_lora=True, max_lora_rank=8,
           max_model_len=256, dtype="bfloat16", enforce_eager=True)
params = SamplingParams(temperature=0, max_tokens=30)

base = llm.generate(["The meaning of life is"], params)[0].outputs[0].text
lora = llm.generate(["The meaning of life is"], params,
                     lora_request=LoRARequest("test", 1, "./test_lora"))[0].outputs[0].text
print(f"Identical: {base == lora}")  # True — LoRA is silently ignored

Results

ConfigBase outputLoRA outputIdentical?
Unpatched"...puzzled philosophers...""...puzzled philosophers..."True (BUG)
Unpatched + --language-model-only"...puzzled philosophers...""...puzzled philosophers..."True (BUG)
Patched"...puzzled philosophers...""...pondered by philosophers..."False (FIX)

Fix

Add "model.": "language_model.model." to hf_to_vllm_mapper in Qwen3VLForConditionalGeneration. Existing more-specific rules (model.visual., model.language_model.) take priority due to prefix-first matching, so no regression.

This pattern is already used in vLLM's own test suite for Baichuan LoRA weight mapping (tests/lora/test_lora_checkpoints.py:108-111).

hf_to_vllm_mapper = WeightsMapper(
    orig_to_new_prefix={
        "model.visual.": "visual.",
        "lm_head.": "language_model.lm_head.",
        "model.language_model.": "language_model.model.",
        "model.": "language_model.model.",  # handle text-only LoRA adapters
    }
)

Related issues

  • #38085 — LoRA module validation warning (fixed, different issue)
  • #36395 / #36603 — TP tensor mismatch (different root cause)

Changed files

  • vllm/model_executor/models/qwen3_vl.py (modified, +1/-0)

Code Example

vllm serve /home/xxx/Qwen3.5-27B \
--tensor-parallel-size 1 \
--reasoning-parser qwen3 \
--max-model-len 262144 \
--enable-prefix-caching \
--gpu-memory-utilization 0.9 \
--block-size 16 \
--port 8002 \
--enable-log-requests \
--enable-prompt-tokens-details \
--max-num-seqs 256 \
--max_log_len 128 \
--default-chat-template-kwargs '{"enable_thinking": false}' \
--served-model-name Qwen3.5-27B \
--enable-lora \
--max-lora-rank 64 \
--lora-modules /home/xxx/Q35-27B_lora

---

(EngineCore pid=731496) WARNING 03-25 10:45:34 [worker_manager.py:153] LoRA module 'language_model.model.layers.0.linear_attn.in_proj_a' in adapter '/home/xxx/Qwen3.5-27B-lora/Q35-27B_lora_xxx' is not in the model's supported LoRA target modules [conv1d, down_proj, gate_up_proj, in_proj_ba, in_proj_qkv, in_proj_z, linear_fc1, linear_fc2, o_proj, out_proj, proj, qkv, qkv_proj]. These parameters will be ignored, which may cause abnormal model behavior.
(EngineCore pid=731496) WARNING 03-25 10:45:34 [worker_manager.py:153] LoRA module 'language_model.model.layers.0.linear_attn.in_proj_b' in adapter '/home/xxx/Qwen3.5-27B-lora/Q35-27B_lora_xxx' is not in the model's supported LoRA target modules [conv1d, down_proj, gate_up_proj, in_proj_ba, in_proj_qkv, in_proj_z, linear_fc1, linear_fc2, o_proj, out_proj, proj, qkv, qkv_proj]. These parameters will be ignored, which may cause abnormal model behavior.
(EngineCore pid=731496) WARNING 03-25 10:45:34 [worker_manager.py:153] LoRA module 'language_model.model.layers.0.mlp.gate_proj' in adapter '/home/xxx/Qwen3.5-27B-lora/Q35-27B_lora_xxx' is not in the model's supported LoRA target modules [conv1d, down_proj, gate_up_proj, in_proj_ba, in_proj_qkv, in_proj_z, linear_fc1, linear_fc2, o_proj, out_proj, proj, qkv, qkv_proj]. These parameters will be ignored, which may cause abnormal model behavior.
(EngineCore pid=731496) WARNING 03-25 10:45:34 [worker_manager.py:153] LoRA module 'language_model.model.layers.0.mlp.up_proj' in adapter '/home/xxx/Qwen3.5-27B-lora/Q35-27B_lora_xxx' is not in the model's supported LoRA target modules [conv1d, down_proj, gate_up_proj, in_proj_ba, in_proj_qkv, in_proj_z, linear_fc1, linear_fc2, o_proj, out_proj, proj, qkv, qkv_proj]. These parameters will be ignored, which may cause abnormal model behavior.
RAW_BUFFERClick to expand / collapse

Your current environment

Env: Driver Version: 590.48.01 CUDA Version: 13.1 vllm: 0.18.1rc0+cu131 transformers: 4.57.6 torch: 2.10.0

🐛 Describe the bug

when I use vllm with multi-lora, I got "LoRA module xxx is not in the model's supported LoRA target modules". Why vllm can not support "in_proj_a, in_proj_b, gate_proj, up_proj, k_proj, q_proj, v_proj".

my vllm script is:

vllm serve /home/xxx/Qwen3.5-27B \
--tensor-parallel-size 1 \
--reasoning-parser qwen3 \
--max-model-len 262144 \
--enable-prefix-caching \
--gpu-memory-utilization 0.9 \
--block-size 16 \
--port 8002 \
--enable-log-requests \
--enable-prompt-tokens-details \
--max-num-seqs 256 \
--max_log_len 128 \
--default-chat-template-kwargs '{"enable_thinking": false}' \
--served-model-name Qwen3.5-27B \
--enable-lora \
--max-lora-rank 64 \
--lora-modules /home/xxx/Q35-27B_lora

the vllm warning is:

(EngineCore pid=731496) WARNING 03-25 10:45:34 [worker_manager.py:153] LoRA module 'language_model.model.layers.0.linear_attn.in_proj_a' in adapter '/home/xxx/Qwen3.5-27B-lora/Q35-27B_lora_xxx' is not in the model's supported LoRA target modules [conv1d, down_proj, gate_up_proj, in_proj_ba, in_proj_qkv, in_proj_z, linear_fc1, linear_fc2, o_proj, out_proj, proj, qkv, qkv_proj]. These parameters will be ignored, which may cause abnormal model behavior.
(EngineCore pid=731496) WARNING 03-25 10:45:34 [worker_manager.py:153] LoRA module 'language_model.model.layers.0.linear_attn.in_proj_b' in adapter '/home/xxx/Qwen3.5-27B-lora/Q35-27B_lora_xxx' is not in the model's supported LoRA target modules [conv1d, down_proj, gate_up_proj, in_proj_ba, in_proj_qkv, in_proj_z, linear_fc1, linear_fc2, o_proj, out_proj, proj, qkv, qkv_proj]. These parameters will be ignored, which may cause abnormal model behavior.
(EngineCore pid=731496) WARNING 03-25 10:45:34 [worker_manager.py:153] LoRA module 'language_model.model.layers.0.mlp.gate_proj' in adapter '/home/xxx/Qwen3.5-27B-lora/Q35-27B_lora_xxx' is not in the model's supported LoRA target modules [conv1d, down_proj, gate_up_proj, in_proj_ba, in_proj_qkv, in_proj_z, linear_fc1, linear_fc2, o_proj, out_proj, proj, qkv, qkv_proj]. These parameters will be ignored, which may cause abnormal model behavior.
(EngineCore pid=731496) WARNING 03-25 10:45:34 [worker_manager.py:153] LoRA module 'language_model.model.layers.0.mlp.up_proj' in adapter '/home/xxx/Qwen3.5-27B-lora/Q35-27B_lora_xxx' is not in the model's supported LoRA target modules [conv1d, down_proj, gate_up_proj, in_proj_ba, in_proj_qkv, in_proj_z, linear_fc1, linear_fc2, o_proj, out_proj, proj, qkv, qkv_proj]. These parameters will be ignored, which may cause abnormal model behavior.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To fix the issue of unsupported LoRA target modules, you need to modify the vllm configuration to include the missing modules.

Here are the steps:

  • Update the vllm configuration to include the missing modules: in_proj_a, in_proj_b, gate_proj, up_proj, k_proj, q_proj, v_proj.
  • You can do this by adding the following code to your vllm script:
import torch

# Define the custom LoRA target modules
custom_lora_modules = [
    'language_model.model.layers.0.linear_attn.in_proj_a',
    'language_model.model.layers.0.linear_attn.in_proj_b',
    'language_model.model.layers.0.mlp.gate_proj',
    'language_model.model.layers.0.mlp.up_proj',
    'language_model.model.layers.0.linear_attn.q_proj',
    'language_model.model.layers.0.linear_attn.k_proj',
    'language_model.model.layers.0.linear_attn.v_proj',
]

# Update the vllm configuration to include the custom LoRA target modules
vllm_config = {
    # ... other configurations ...
    'lora_modules': custom_lora_modules,
    # ... other configurations ...
}
  • Alternatively, you can also update the vllm configuration using the command-line argument --lora-modules:
vllm serve /home/xxx/Qwen3.5-27B \
--tensor-parallel-size 1 \
--reasoning-parser qwen3 \
--max-model-len 262144 \
--enable-prefix-caching \
--gpu-memory-utilization 0.9 \
--block-size 16 \
--port 8002 \
--enable-log-requests \
--enable-prompt-tokens-details \
--max-num-seqs 256 \
--max_log_len 128 \
--default-chat-template-kwargs '{"enable_thinking": false}' \
--served-model-name Qwen3.5-27B \
--enable-lora \
--max-lora-rank 64 \
--lora-modules language_model.model.layers.0.linear_attn.in_proj_a,language_model.model.layers.0.linear_attn.in_proj_b,language_model.model.layers.0.mlp.gate_proj,language_model.model.layers.0.mlp.up_proj,language_model.model.layers.0.linear_attn.q_proj,language_model.model.layers.0.linear_attn.k_proj,language_model.model.layers.0.linear_attn.v_proj \
--lora-modules-path /home/xxx/Q35-27B_lora

Verification

To verify that the fix worked, you can check the vllm logs for any warnings or errors related to unsupported LoRA target modules. If the fix was successful, you should no longer see any warnings or errors related to the missing

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING