vllm - ✅(Solved) Fix [Bug]: Gemma4 LoRA can be zeroed out by duplicate alias-path registration in `LoRAModelManager` [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39815Fetched 2026-04-16 06:36:28
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1labeled ×1

Root Cause

LoRAModelManager._create_lora_modules() was traversing the model with:

self.model.named_modules(remove_duplicate=False)

That is unsafe for LoRA registration because this manager mutates live module state during adapter activation.

With aliased paths, the same physical module can be discovered under both a canonical path and an alias path. A Gemma4-style example is:

  • layers.0.proj
  • self_decoder.decoder_layers.0.proj

If the adapter only contains weights for the canonical path, activation can do:

  1. set_lora(...) for layers.0.proj
  2. later reset_lora(...) for self_decoder.decoder_layers.0.proj

Because both names refer to the same live wrapper, the final LoRA buffers end up zero.

Fix Action

Fixed

PR fix notes

PR #39816: [Bugfix]: deduplicate aliased LoRA modules

Description (problem / solution / changelog)

<!-- markdownlint-disable -->

Fixes: #39815

Purpose

This PR deduplicates module traversal in _create_lora_modules() so LoRA registration happens once per physical module instead of once per alias path.

The previous implementation used:

self.model.named_modules(remove_duplicate=False)

That is not safe for a side-effectful manager like LoRAModelManager, because multiple names can refer to the same live module object.

This change switches registration to the default deduplicated traversal and adds a regression test covering aliased child-module paths.

For more background and the full root-cause analysis, see the linked issue.

Test Plan

Run the focused regression test:

python -m pytest tests/lora/test_model_manager_aliasing.py -q

Test Result

Regression test result:

PASSED

The regression test verifies:

  • aliased paths can refer to the same child module
  • the fixed manager registers only one LoRA wrapper for that shared module
  • under simulated duplicate traversal, the same live wrapper can be set and then reset, leaving final LoRA buffers at zero

Also worth noting that this bug was surfaced after aliasing was introduced in the gemma4 implementation in this PR: #38879


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • tests/lora/test_model_manager_aliasing.py (added, +204/-0)
  • vllm/lora/model_manager.py (modified, +1/-1)

Code Example

self.model.named_modules(remove_duplicate=False)

---

for module_name, module in self.model.named_modules():
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

Gemma4-style aliased module paths can cause LoRA activation to zero itself out in LoRAModelManager.

The issue is not that aliasing exists. The issue is that LoRA module creation was registering the same physical module multiple times under different names.

Root Cause

LoRAModelManager._create_lora_modules() was traversing the model with:

self.model.named_modules(remove_duplicate=False)

That is unsafe for LoRA registration because this manager mutates live module state during adapter activation.

With aliased paths, the same physical module can be discovered under both a canonical path and an alias path. A Gemma4-style example is:

  • layers.0.proj
  • self_decoder.decoder_layers.0.proj

If the adapter only contains weights for the canonical path, activation can do:

  1. set_lora(...) for layers.0.proj
  2. later reset_lora(...) for self_decoder.decoder_layers.0.proj

Because both names refer to the same live wrapper, the final LoRA buffers end up zero.

Why This Shows Up With Gemma4

The issue is exposed by Gemma4-style aliasing, where the same decoder child module is reachable through more than one registered path.

This is a valid model structure. The bug is that LoRAModelManager treated those alias paths as separate LoRA-managed modules even though they point to the same live object.

Proposed Fix

Deduplicate traversal in _create_lora_modules():

for module_name, module in self.model.named_modules():

This preserves one LoRA registration per physical module while leaving aliasing itself untouched.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

To fix the issue where LoRA activation zeros itself out due to aliased module paths, deduplicate the module traversal in LoRAModelManager._create_lora_modules().

Guidance

  • The issue arises from LoRAModelManager treating aliased paths as separate LoRA-managed modules, even though they point to the same live object.
  • To verify the issue, check if the same physical module is being registered multiple times under different names, causing LoRA buffers to end up zero.
  • The proposed fix involves deduplicating the traversal in _create_lora_modules() by using self.model.named_modules() without remove_duplicate=False.
  • To mitigate the issue, ensure that the adapter only contains weights for the canonical path, and avoid using aliased paths for LoRA registration.

Example

# Before
self.model.named_modules(remove_duplicate=False)

# After
for module_name, module in self.model.named_modules():
    # Register LoRA module
    pass

Notes

This fix assumes that the issue is solely caused by the duplicate registration of aliased modules. If other factors contribute to the problem, additional debugging may be necessary.

Recommendation

Apply the proposed fix by deduplicating the module traversal in LoRAModelManager._create_lora_modules(), as it directly addresses the root cause of the issue and preserves the validity of aliasing itself.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Bug]: Gemma4 LoRA can be zeroed out by duplicate alias-path registration in `LoRAModelManager` [1 pull requests, 1 participants]