transformers - ✅(Solved) Fix [Bug] Model collapse after merging LoRA with extended vocabulary on models with tie_word_embeddings=True (e.g., Qwen2.5 0.5B) [4 pull requests, 1 comments, 2 participants]

transformers2026-03-30 19:03:20

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45127•Fetched 2026-04-08 01:52:41

View on GitHub

Comments

Participants

Timeline

Reactions

Author

YangNobody12

Participants

Cursx

YangNobody12

Timeline (top)

cross-referenced ×4commented ×1labeled ×1

Root Cause

Description: When extending the vocabulary size (e.g., adding audio/special tokens) on a base model that uses tied embeddings (config.tie_word_embeddings = True, such as Qwen), training a LoRA adapter, and subsequently merging it using peft_model.merge_and_unload(), the resulting saved model produces completely degraded outputs (repeating the same token infinitely) upon reloading. The issue occurs because peft correctly merges the separated embed_tokens and lm_head weights in memory, but when the merged model is saved, the config.tie_word_embeddings flag remains True. Upon reloading the merged model via AutoModelForCausalLM.from_pretrained(), the lm_head weights are overwritten by the embed_tokens weights (or vice versa) due to the tied config, completely destroying the newly trained weights for the extended vocabulary.

Fix Action

Fix / Workaround

Workaround / Proposed Fix: Manually setting tie_word_embeddings = False before saving the merged model completely resolves the issue, as it forces transformers to save and load both layers independently:

WORKAROUND: Untie embeddings before saving

merged_model.config.tie_word_embeddings = False

PR fix notes

PR #45135: Fix model saving corruption for dynamically untied embeddings

Repository: huggingface/transformers
Author: Cursx
State: closed | merged: False
Link: https://github.com/huggingface/transformers/pull/45135

Description (problem / solution / changelog)

What does this PR do?

Fixes an issue where PEFT adapters applied independently to tied embeddings (embed_tokens and lm_head) cause silent model corruption upon reloading via AutoModelForCausalLM.from_pretrained().

Root Cause: When embeddings are untied dynamically during runtime (e.g., vocabulary resizing and independent PEFT merging), their tensor memory storage diverges. PreTrainedModel.save_pretrained() correctly saves both parameter tensors because remove_tied_weights_from_state_dict() sees they don't share identical storage. However, the model configuration saves tie_word_embeddings = True. Upon reloading, from_pretrained() sees tie_word_embeddings=True and aggressively re-ties the two embeddings by overwriting one parameter with the other, effectively destroying the independent delta weights. Fix: Included a check in save_pretrained(): if config.tie_word_embeddings is True but input_embeddings.weight.data_ptr() != output_embeddings.weight.data_ptr(), it automatically flips model.config.tie_word_embeddings to False before saving the configuration mapping, preventing silent destruction on loading.

Fixes # #45127

Code Agent Policy

The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by code agents. We are currently bottlenecked by our ability to review and respond to them. As a result, we ask that new users do not submit pure code agent PRs at this time. You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents not to open any PRs or issues for the moment.

PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this repeatedly or maliciously.

This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result, this policy is likely to be updated regularly in the near future. For more information, please read CONTRIBUTING.md.

I confirm that this is not a pure code agent PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you write any new necessary tests? (Note: A minimal reproduction script is provided below to easily test and validate the fix, as it requires mocking PEFT adaptation)

Reproduction Script

<details><summary>Click to view minimal repro (repro.py)</summary>

import torch
import gc
from transformers import AutoModelForCausalLM, AutoConfig

def main():
    # Make a tiny dummy model with tied embeddings
    config = AutoConfig.from_pretrained("Qwen/Qwen1.5-0.5B", trust_remote_code=True)
    config.hidden_size = 32
    config.intermediate_size = 64
    config.num_hidden_layers = 2
    config.num_attention_heads = 4
    config.num_key_value_heads = 4
    config.vocab_size = 1000
    config.tie_word_embeddings = True
    
    # Needs to be small to fit in memory
    model = AutoModelForCausalLM.from_config(config)
    print("Initial tie config:", model.config.tie_word_embeddings)
    print("Are weights tied initially?", id(model.get_input_embeddings().weight) == id(model.get_output_embeddings().weight))
    
    # Suppose PEFT does something and they are no longer tied
    model.get_output_embeddings().weight = torch.nn.Parameter(model.get_output_embeddings().weight.clone())
    model.get_output_embeddings().weight.data += 1.0 # Modify to make them different
    
    print("Are weights tied after fake PEFT?", id(model.get_input_embeddings().weight) == id(model.get_output_embeddings().weight))
    
    model.save_pretrained("./test_tied_model")
    
    # Reload
    model_reloaded = AutoModelForCausalLM.from_pretrained("./test_tied_model")
    print("Reloaded tie config:", model_reloaded.config.tie_word_embeddings)
    print("Are weights tied after reload?", id(model_reloaded.get_input_embeddings().weight) == id(model_reloaded.get_output_embeddings().weight))
    
    # Check if the output embeddings weight is equal to input embeddings
    print("Is the modified output weight preserved?", not torch.allclose(model_reloaded.get_output_embeddings().weight, model_reloaded.get_input_embeddings().weight))
if __name__ == "__main__":
    main()

</details>

Who can review?

@BenjaminBossan @githubnemo

Changed files

src/transformers/modeling_utils.py (modified, +44/-0)

PR #45136: Fix #45127: Auto-fix diverged tie_word_embeddings config on save to prevent silent weight corruption

Repository: huggingface/transformers
Author: Cursx
State: closed | merged: False
Link: https://github.com/huggingface/transformers/pull/45136

Description (problem / solution / changelog)

What does this PR do?

This PR fixes a bug in PreTrainedModel.save_pretrained() where config.tie_word_embeddings can be inconsistent with the actual weight state, leading to silent model corruption for downstream consumers.

Problem

After PEFT merge_and_unload() (typical scenario: Qwen, Llama, Mistral, etc.), embed_tokens and lm_head weights are separated in memory with different values, but config.tie_word_embeddings remains True. Currently, save_pretrained() performs no validation of tie_word_embeddings against the actual weight state — the incorrect config is written to config.json as-is.

This causes two issues:

Reloading via from_pretrained: The load-side safety check (modeling_utils.py:2535-2547) detects the inconsistency and refuses to tie, emitting a warning each time — but the config is still semantically wrong.
Downstream tool consumption (GGUF converters, quantization scripts, etc.): These tools trust tie_word_embeddings: true in config.json directly, potentially causing silent weight corruption — one tensor overwrites the other, producing completely degraded outputs.

Fix

In save_pretrained(), before writing the config to disk, we detect whether the input/output embeddings have diverged. If so, we automatically set config.tie_word_embeddings = False and emit a warning.

Key safety considerations:

Only triggers when the output embedding key (e.g., lm_head.weight) is explicitly declared in the model's _tied_weights_keys mapping as tied to the input embedding. This prevents false positives on models like Pop2Piano, which uses tie_word_embeddings=True for decoder output scaling but does not declare lm_head.weight in its _tied_weights_keys (it only ties encoder.embed_tokens and decoder.embed_tokens to shared).
Cross-device scenarios (model parallelism / offloading) are skipped entirely to avoid false positives and potential OOM from implicit device copies.
T5 family safety analysis:
- T5: Scaling is decoupled to an independent scale_decoder_outputs field (configuration_t5.py:82-83) and tie_word_embeddings is forced to True. Not affected.
- UMT5: Config init forcibly overrides tie_word_embeddings = True — even if saved as False, it's restored on load. Not affected.
- LongT5, Pop2Piano, SwitchTransformers: Still read tie_word_embeddings for scaling in forward, but these are guarded by the _tied_weights_keys check described above.

Changes

modeling_utils.py:

Added weight divergence detection + auto-fix logic before config saving in save_pretrained()
_tied_weights_keys guard to only auto-fix when the output embedding is declared as tied to the input embedding
Cross-device: skip check (avoid false positives and potential OOM)
NotImplementedError: silently ignored (expected for vision/speech backbones)
Other exceptions: logged via logger.debug

test_modeling_utils.py:

Added test_save_pretrained_auto_fixes_diverged_tied_embeddings: constructs a tied Llama model → simulates weight divergence (PEFT merge) → verifies saved config is corrected + warning is emitted + reloaded weights are preserved correctly

Fixes # #45127

Code Agent Policy

PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this repeatedly or maliciously.

I confirm that this is not a pure code agent PR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

Changed files

src/transformers/modeling_utils.py (modified, +48/-0)
tests/utils/test_modeling_utils.py (modified, +35/-0)

PR #45156: Fix save_pretrained writing incorrect tie_word_embeddings=True config after PEFT merge

Repository: huggingface/transformers
Author: Cursx
State: closed | merged: False
Link: https://github.com/huggingface/transformers/pull/45156

Description (problem / solution / changelog)

What does this PR do?

After the merge_and_unload() operation in PEFT, embed_tokens and lm_head become independent tensors with different values, but config.tie_word_embeddings remains True. The load-side already detects this using torch.equal in tie_weights() and skips tying, but save_pretrained() writes the incorrect config as-is.（tie_word_embeddings=True is already in a "semantic error" state in memory. Changing it to False is closer to the actual situation.）

issue # 45127——PEFT merge_and_unload() creates an inconsistent state (weights have been separated, but configuration has not been updated). impact: Downstream tools (GGUF converters, quantization scripts) trust this config directly, leading to silent weight corruption.

test

I wrote a simple script to reproduce the problem and tested it locally.

I ran make fix-repo and performed the following related tests： test_save_pretrained_auto_fixes_diverged_tied_embeddings（new test） test_tied_weights_are_not_tied_if_both_present_but_different（load-side ） test_tied_weights_are_tied_if_both_present_and_similar test_tied_weights_are_always_tied_from_config

The CI test is also passed.

Fixes https://github.com/huggingface/transformers/issues/45127

Code Agent Policy

PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this repeatedly or maliciously.

I confirm that this is not a pure code agent PR.

I used multiple AI models (Gemini, Claude, Kimi) to cross-validate edge cases and boundary conditions — different models behave differently around tied embeddings, which made CI failures harder to predict than expected. AI helped me locate these edge cases and I verified they weren't hallucinations.

I have read CONTRIBUTING.md, and tried my best to follow the instructions therein.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.https://github.com/huggingface/transformers/issues/45127
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@CyrilVallez @BenjaminBossan

Changed files

src/transformers/modeling_utils.py (modified, +28/-0)
tests/utils/test_modeling_utils.py (modified, +22/-0)

PR #3126: Fix save_pretrained writing incorrect tie_word_embeddings=True config after PEFT merge

Repository: huggingface/peft
Author: Cursx
State: closed | merged: True
Link: https://github.com/huggingface/peft/pull/3126

Description (problem / solution / changelog)

After the merge_and_unload() operation in PEFT, embed_tokens and lm_head become independent tensors with different values, but config.tie_word_embeddings remains True. The load-side already detects this using torch.equal in tie_weights() and skips tying, but save_pretrained() writes the incorrect config as-is.（tie_word_embeddings=True is already in a "semantic error" state in memory. Changing it to False is closer to the actual situation.）

Fixes https://github.com/huggingface/transformers/issues/45127

Although this issue was raised in the Transformers community, perhaps it would be better resolved in peft? It should have no negative impact on downstream transformers.

In transformers, load-side (tie_weights) check uses torch.equal for value verification. Here, I'm using is not to confirm whether the structure is separated. (I'm not sure if this is the best approach.) An additional test has been added; it can be removed if unnecessary.

Changed files

src/peft/tuners/tuners_utils.py (modified, +19/-0)
tests/test_decoder_models.py (modified, +16/-0)

Code Example

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "Qwen/Qwen1.5-0.5B" # Or any model with tied embeddings
new_vocab_size = 180500 # Extended for audio tokens

# 1. Load base model and resize embeddings
base_model = AutoModelForCausalLM.from_pretrained(base_model_id, device_map="cpu")
base_model.resize_token_embeddings(new_vocab_size)

# 2. Load trained LoRA adapter
# (Assuming a LoRA was trained on the extended vocab targeting embed_tokens and lm_head)
peft_model = PeftModel.from_pretrained(base_model, "./my_lora_adapter")

# 3. Merge and Save
merged_model = peft_model.merge_and_unload()
merged_model.save_pretrained("./merged_model", safe_serialization=True)

# 4. Reload and Generate -> BUG OCCURS HERE
# The model will now output repeating garbage tokens (e.g., [135528, 135528, 135528...])
reloaded_model = AutoModelForCausalLM.from_pretrained("./merged_model", device_map="cuda")

---

merged_model = peft_model.merge_and_unload()

# WORKAROUND: Untie embeddings before saving
merged_model.config.tie_word_embeddings = False 

merged_model.save_pretrained("./merged_model_fixed", safe_serialization=True)

RAW_BUFFERClick to expand / collapse

System Info

Name: transformers Version: 4.56.2 Python 3.11.15 Name: torch Version: 2.11.0+cu126

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Reproduction Steps:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "Qwen/Qwen1.5-0.5B" # Or any model with tied embeddings
new_vocab_size = 180500 # Extended for audio tokens

# 1. Load base model and resize embeddings
base_model = AutoModelForCausalLM.from_pretrained(base_model_id, device_map="cpu")
base_model.resize_token_embeddings(new_vocab_size)

# 2. Load trained LoRA adapter
# (Assuming a LoRA was trained on the extended vocab targeting embed_tokens and lm_head)
peft_model = PeftModel.from_pretrained(base_model, "./my_lora_adapter")

# 3. Merge and Save
merged_model = peft_model.merge_and_unload()
merged_model.save_pretrained("./merged_model", safe_serialization=True)

# 4. Reload and Generate -> BUG OCCURS HERE
# The model will now output repeating garbage tokens (e.g., [135528, 135528, 135528...])
reloaded_model = AutoModelForCausalLM.from_pretrained("./merged_model", device_map="cuda")

Expected behavior

Expected behavior: The merged model should retain the learned weights for both embed_tokens and lm_head independently for the newly added tokens, and generate text/tokens correctly as it did before saving.

merged_model = peft_model.merge_and_unload()

# WORKAROUND: Untie embeddings before saving
merged_model.config.tie_word_embeddings = False 

merged_model.save_pretrained("./merged_model_fixed", safe_serialization=True)

It would be great if peft's merge_and_unload() or transformers's save_pretrained() could automatically detect if vocabulary has been resized/untied during PEFT training and automatically handle the tie_word_embeddings flag to prevent silent model corruption upon saving.

extent analysis

Fix Plan

To resolve the issue, follow these steps:

Before saving the merged model, manually set tie_word_embeddings to False in the model's configuration:

merged_model = peft_model.merge_and_unload()
merged_model.config.tie_word_embeddings = False
merged_model.save_pretrained("./merged_model_fixed", safe_serialization=True)

Alternatively, you can modify the merge_and_unload method in the PeftModel class to automatically set tie_word_embeddings to False when the vocabulary has been resized:

class PeftModel:
    # ...
    def merge_and_unload(self):
        # ...
        if self.model.config.vocab_size != self.original_vocab_size:
            self.model.config.tie_word_embeddings = False
        # ...

To make this change more robust, you can also add a check to ensure that the tie_word_embeddings flag is set to False when saving the model:

class AutoModelForCausalLM:
    # ...
    def save_pretrained(self, save_directory, **kwargs):
        # ...
        if self.config.tie_word_embeddings and self.config.vocab_size != self.original_vocab_size:
            self.config.tie_word_embeddings = False
        # ...

Verification

To verify that the fix worked, reload the saved model and generate text using the generate method:

reloaded_model = AutoModelForCausalLM.from_pretrained("./merged_model_fixed", device_map="cuda")
output = reloaded_model.generate(input_ids, max_length=100)
print(output)

The output should be coherent and not repeat the same token infinitely.

Extra Tips

When working with models that have tied embeddings, be cautious when resizing the vocabulary or training adapters, as this can lead to silent model corruption.
Always verify the model's behavior after saving and reloading to ensure that the changes have been applied correctly.
Consider adding automated tests to detect issues like this in the future.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

merged_model = peft_model.merge_and_unload()

# WORKAROUND: Untie embeddings before saving
merged_model.config.tie_word_embeddings = False 

merged_model.save_pretrained("./merged_model_fixed", safe_serialization=True)

#autograd error #model save/load #optimization #mixed precision #training loop

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

transformers - ✅(Solved) Fix [Bug] Model collapse after merging LoRA with extended vocabulary on models with tie_word_embeddings=True (e.g., Qwen2.5 0.5B) [4 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

WORKAROUND: Untie embeddings before saving

PR fix notes

PR #45135: Fix model saving corruption for dynamically untied embeddings

Description (problem / solution / changelog)

What does this PR do?

Code Agent Policy

Before submitting

Who can review?

Changed files

PR #45136: Fix #45127: Auto-fix diverged tie_word_embeddings config on save to prevent silent weight corruption

Description (problem / solution / changelog)

What does this PR do?

Problem

Fix

Changes

Code Agent Policy

Before submitting

Who can review?

Changed files

PR #45156: Fix save_pretrained writing incorrect tie_word_embeddings=True config after PEFT merge

Description (problem / solution / changelog)

What does this PR do?

test

Code Agent Policy

Before submitting

Who can review?

Changed files

PR #3126: Fix save_pretrained writing incorrect tie_word_embeddings=True config after PEFT merge

Description (problem / solution / changelog)

Changed files

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

Fix Plan

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING