transformers - ✅(Solved) Fix [Regression] Qwen3.5 `save_pretrained` still saves incorrect visual encoder keys in 5.5.3 [2 pull requests, 1 comments, 2 participants]

transformers2026-04-10 10:01:58

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45357•Fetched 2026-04-11 06:12:16

View on GitHub

Comments

Participants

Timeline

Reactions

Author

johnking0099

Participants

Cyrilvallez

johnking0099

Timeline (top)

closed ×1commented ×1cross-referenced ×1labeled ×1

save_pretrained on Qwen3_5ForConditionalGeneration produces incorrect weight keys for the visual encoder. This issue was partially addressed in v5.5.2 (#45340), which fixed the text model key regression. However, the visual encoder keys are still saved with a wrong prefix in v5.5.3.

This makes it impossible to reload the saved checkpoint without manual key remapping.

Root Cause

This makes it impossible to reload the saved checkpoint without manual key remapping.

Fix Action

Fixed

Fixed by PR: Fix vlm weight mappings (https://github.com/huggingface/transformers/pull/45358)

PR fix notes

PR #45358: Fix vlm weight mappings

Repository: huggingface/transformers
Author: Cyrilvallez
State: closed | merged: True
Link: https://github.com/huggingface/transformers/pull/45358

Description (problem / solution / changelog)

What does this PR do?

Fix https://github.com/huggingface/transformers/issues/45357 finally. This was not catched in the previous fix, as the model can be reloaded correctly by from_pretrained, but keys are still wrongly serialized!

After deeper look, I noticed @zucchini-nlp did not correctly copy the mappings in https://github.com/huggingface/transformers/pull/44627... It is EXTREMELY IMPORTANT and can very easily silently break loading and/or saving @zucchini-nlp - we cannot touch the mappings without being 100% sure of the change. You missed a lot before In this case, most would load correctly, but would not resave the same format

For ref, I'm using this small snippet to check formats:

import transformers
from transformers import LlavaForConditionalGeneration, LlavaNextForConditionalGeneration
from safetensors.torch import load_file
from transformers.utils.hub import cached_file, cached_files
import json

# model_id = "llava-hf/llava-1.5-7b-hf"
model_id = "llava-hf/llava-v1.6-mistral-7b-hf"
# model_id = "adept/fuyu-8b"
model_class = transformers.LlavaNextForConditionalGeneration

target_folder = "/raid/cyril/test_model"

with open(cached_file(model_id, "model.safetensors.index.json")) as f:
    index = json.load(f)
model_files = set(index["weight_map"].values())
model_files = cached_files(model_id, model_files)

original_state_dict = {}
for file in model_files:
    original_state_dict.update(load_file(file))

model = model_class.from_pretrained(model_id)
model.save_pretrained(target_folder)

saved_weights = load_file(f"{target_folder}/model.safetensors")

not_in_saved = []
for k, v in original_state_dict.items():
    if k not in saved_weights:
        not_in_saved.append(k)
    else:
        assert (v == saved_weights[k]).all()

not_in_original = []
for k, v in saved_weights.items():
    if k not in original_state_dict:
        not_in_original.append(k)
    else:
        assert (v == original_state_dict[k]).all()

print(f"The following are in original but not in saved: {not_in_saved}")
print(f"The following are saved but not in original: {not_in_original}")

model = model_class.from_pretrained(target_folder)

Changed files

src/transformers/conversion_mapping.py (modified, +42/-12)
tests/models/aya_vision/test_modeling_aya_vision.py (modified, +3/-0)
tests/models/cohere_asr/test_modeling_cohere_asr.py (modified, +2/-8)
tests/models/colpali/test_modeling_colpali.py (modified, +4/-0)
tests/models/colqwen2/test_modeling_colqwen2.py (modified, +4/-0)
tests/models/emu3/test_modeling_emu3.py (modified, +3/-0)
tests/models/fuyu/test_modeling_fuyu.py (modified, +3/-0)
tests/models/gemma3/test_modeling_gemma3.py (modified, +3/-0)
tests/models/got_ocr2/test_modeling_got_ocr2.py (modified, +3/-0)
tests/models/gpt_oss/test_modeling_gpt_oss.py (modified, +1/-1)
tests/models/internvl/test_modeling_internvl.py (modified, +3/-0)
tests/models/llava/test_modeling_llava.py (modified, +3/-0)
tests/models/llava_next/test_modeling_llava_next.py (modified, +3/-0)
tests/models/llava_next_video/test_modeling_llava_next_video.py (modified, +3/-0)
tests/models/llava_onevision/test_modeling_llava_onevision.py (modified, +3/-0)
tests/models/mistral3/test_modeling_mistral3.py (modified, +3/-0)
tests/models/mllama/test_modeling_mllama.py (modified, +3/-0)
tests/models/nemotron_h/test_modeling_nemotron_h.py (modified, +1/-4)
tests/models/paligemma/test_modeling_paligemma.py (modified, +3/-0)
tests/models/qwen2_5_vl/test_modeling_qwen2_5_vl.py (modified, +2/-9)
tests/models/qwen2_vl/test_modeling_qwen2_vl.py (modified, +2/-7)
tests/models/video_llava/test_modeling_video_llava.py (modified, +3/-0)
tests/models/vipllava/test_modeling_vipllava.py (modified, +3/-0)
tests/test_modeling_common.py (modified, +8/-1)

Code Example

import transformers
from safetensors.torch import load_file

model = transformers.Qwen3_5ForConditionalGeneration.from_pretrained("Qwen/Qwen3.5-0.8B")
model.save_pretrained("./qwen35-saved")

tensors = load_file("./qwen35-saved/model.safetensors")
keys = list(tensors.keys())

# Visual encoder keys should start with "model.visual.*"
bad_visual = [k for k in keys if k.startswith("model.language_model.visual.")]
assert len(bad_visual) == 0, f"Incorrect visual keys found:\n" + "\n".join(bad_visual[:3])

RAW_BUFFERClick to expand / collapse

System Info

transformers version: 5.5.0, 5.5.3
Platform: Linux (NVIDIA A100 80GB × 8)
Python version: 3.12
PyTorch version: 2.9.1+cu128
CUDA version: 12.8

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Description

This makes it impossible to reload the saved checkpoint without manual key remapping.

Reproduction

import transformers
from safetensors.torch import load_file

model = transformers.Qwen3_5ForConditionalGeneration.from_pretrained("Qwen/Qwen3.5-0.8B")
model.save_pretrained("./qwen35-saved")

tensors = load_file("./qwen35-saved/model.safetensors")
keys = list(tensors.keys())

# Visual encoder keys should start with "model.visual.*"
bad_visual = [k for k in keys if k.startswith("model.language_model.visual.")]
assert len(bad_visual) == 0, f"Incorrect visual keys found:\n" + "\n".join(bad_visual[:3])

Expected behavior

Expected vs Actual Behavior

Comparing keys from the original model weights (reference) vs. checkpoint saved by save_pretrained:

Module	Original model (reference)	Saved by 5.5.0	Saved by 5.5.3
Text layers	`model.language_model.layers.*`	`model.language_model.language_model.language_model.layers.*` ❌	`model.language_model.layers.*` ✅
Visual encoder	`model.visual.*`	`model.language_model.visual.*` ❌	`model.language_model.visual.*` ❌

v5.5.0: Both text and visual keys are broken.
v5.5.3: Text keys are fixed (by #45340), but visual encoder keys remain incorrect — model.language_model.visual.* instead of model.visual.*.

Impact

Checkpoints saved after fine-tuning cannot be loaded by from_pretrained without manual key conversion.
Inference frameworks (vLLM, lmdeploy, etc.) fail to load fine-tuned Qwen3.5 checkpoints.

#45216 — Original report of the save_pretrained regression in 5.4.0
#45340 — PR that fixed the text model key regression (but not the visual encoder)

extent analysis

TL;DR

The most likely fix for the incorrect weight keys in the visual encoder when using save_pretrained on Qwen3_5ForConditionalGeneration is to manually remap the keys or wait for a future version of the transformers library that addresses this issue.

Guidance

Verify that the issue is indeed caused by the incorrect prefix in the visual encoder keys by checking the saved model's keys against the expected keys.
Consider manually remapping the keys to the correct prefix (model.visual.*) when loading the saved checkpoint.
Keep an eye on future updates to the transformers library, as this issue may be addressed in a future version.
If possible, test the model with a different version of the transformers library to see if the issue is specific to version 5.5.3.

Example

# Manually remap the keys to the correct prefix
tensors = load_file("./qwen35-saved/model.safetensors")
keys = list(tensors.keys())
remapped_tensors = {}
for key in keys:
    if key.startswith("model.language_model.visual."):
        remapped_key = key.replace("model.language_model.visual.", "model.visual.")
        remapped_tensors[remapped_key] = tensors[key]
    else:
        remapped_tensors[key] = tensors[key]

Notes

The issue is specific to the Qwen3_5ForConditionalGeneration model and the save_pretrained method in version 5.5.3 of the transformers library. The fix may not be applicable to other models or versions.

Recommendation

Apply a workaround, such as manual key remapping, until a future version of the transformers library addresses this issue. This is because the current version (5.5.3) still has the incorrect prefix for the visual encoder keys.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#prompt issue #agent setup #task chaining #parallel task #integration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

transformers - ✅(Solved) Fix [Regression] Qwen3.5 `save_pretrained` still saves incorrect visual encoder keys in 5.5.3 [2 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #45358: Fix vlm weight mappings

Description (problem / solution / changelog)

What does this PR do?

Changed files

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Description

Reproduction

Expected behavior

Expected vs Actual Behavior

Impact

Related

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING