transformers - ✅(Solved) Fix [Regression] Qwen3.5 `save_pretrained` still saves incorrect visual encoder keys in 5.5.3 [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#45357Fetched 2026-04-11 06:12:16
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Timeline (top)
closed ×1commented ×1cross-referenced ×1labeled ×1

save_pretrained on Qwen3_5ForConditionalGeneration produces incorrect weight keys for the visual encoder. This issue was partially addressed in v5.5.2 (#45340), which fixed the text model key regression. However, the visual encoder keys are still saved with a wrong prefix in v5.5.3.

This makes it impossible to reload the saved checkpoint without manual key remapping.

Root Cause

save_pretrained on Qwen3_5ForConditionalGeneration produces incorrect weight keys for the visual encoder. This issue was partially addressed in v5.5.2 (#45340), which fixed the text model key regression. However, the visual encoder keys are still saved with a wrong prefix in v5.5.3.

This makes it impossible to reload the saved checkpoint without manual key remapping.

Fix Action

Fixed

PR fix notes

PR #45358: Fix vlm weight mappings

Description (problem / solution / changelog)

What does this PR do?

Fix https://github.com/huggingface/transformers/issues/45357 finally. This was not catched in the previous fix, as the model can be reloaded correctly by from_pretrained, but keys are still wrongly serialized!

After deeper look, I noticed @zucchini-nlp did not correctly copy the mappings in https://github.com/huggingface/transformers/pull/44627... It is EXTREMELY IMPORTANT and can very easily silently break loading and/or saving @zucchini-nlp - we cannot touch the mappings without being 100% sure of the change. You missed a lot before In this case, most would load correctly, but would not resave the same format

For ref, I'm using this small snippet to check formats:

import transformers
from transformers import LlavaForConditionalGeneration, LlavaNextForConditionalGeneration
from safetensors.torch import load_file
from transformers.utils.hub import cached_file, cached_files
import json

# model_id = "llava-hf/llava-1.5-7b-hf"
model_id = "llava-hf/llava-v1.6-mistral-7b-hf"
# model_id = "adept/fuyu-8b"
model_class = transformers.LlavaNextForConditionalGeneration

target_folder = "/raid/cyril/test_model"

with open(cached_file(model_id, "model.safetensors.index.json")) as f:
    index = json.load(f)
model_files = set(index["weight_map"].values())
model_files = cached_files(model_id, model_files)

original_state_dict = {}
for file in model_files:
    original_state_dict.update(load_file(file))

model = model_class.from_pretrained(model_id)
model.save_pretrained(target_folder)

saved_weights = load_file(f"{target_folder}/model.safetensors")

not_in_saved = []
for k, v in original_state_dict.items():
    if k not in saved_weights:
        not_in_saved.append(k)
    else:
        assert (v == saved_weights[k]).all()

not_in_original = []
for k, v in saved_weights.items():
    if k not in original_state_dict:
        not_in_original.append(k)
    else:
        assert (v == original_state_dict[k]).all()

print(f"The following are in original but not in saved: {not_in_saved}")
print(f"The following are saved but not in original: {not_in_original}")

model = model_class.from_pretrained(target_folder)

Changed files

  • src/transformers/conversion_mapping.py (modified, +42/-12)
  • tests/models/aya_vision/test_modeling_aya_vision.py (modified, +3/-0)
  • tests/models/cohere_asr/test_modeling_cohere_asr.py (modified, +2/-8)
  • tests/models/colpali/test_modeling_colpali.py (modified, +4/-0)
  • tests/models/colqwen2/test_modeling_colqwen2.py (modified, +4/-0)
  • tests/models/emu3/test_modeling_emu3.py (modified, +3/-0)
  • tests/models/fuyu/test_modeling_fuyu.py (modified, +3/-0)
  • tests/models/gemma3/test_modeling_gemma3.py (modified, +3/-0)
  • tests/models/got_ocr2/test_modeling_got_ocr2.py (modified, +3/-0)
  • tests/models/gpt_oss/test_modeling_gpt_oss.py (modified, +1/-1)
  • tests/models/internvl/test_modeling_internvl.py (modified, +3/-0)
  • tests/models/llava/test_modeling_llava.py (modified, +3/-0)
  • tests/models/llava_next/test_modeling_llava_next.py (modified, +3/-0)
  • tests/models/llava_next_video/test_modeling_llava_next_video.py (modified, +3/-0)
  • tests/models/llava_onevision/test_modeling_llava_onevision.py (modified, +3/-0)
  • tests/models/mistral3/test_modeling_mistral3.py (modified, +3/-0)
  • tests/models/mllama/test_modeling_mllama.py (modified, +3/-0)
  • tests/models/nemotron_h/test_modeling_nemotron_h.py (modified, +1/-4)
  • tests/models/paligemma/test_modeling_paligemma.py (modified, +3/-0)
  • tests/models/qwen2_5_vl/test_modeling_qwen2_5_vl.py (modified, +2/-9)
  • tests/models/qwen2_vl/test_modeling_qwen2_vl.py (modified, +2/-7)
  • tests/models/video_llava/test_modeling_video_llava.py (modified, +3/-0)
  • tests/models/vipllava/test_modeling_vipllava.py (modified, +3/-0)
  • tests/test_modeling_common.py (modified, +8/-1)

Code Example

import transformers
from safetensors.torch import load_file

model = transformers.Qwen3_5ForConditionalGeneration.from_pretrained("Qwen/Qwen3.5-0.8B")
model.save_pretrained("./qwen35-saved")

tensors = load_file("./qwen35-saved/model.safetensors")
keys = list(tensors.keys())

# Visual encoder keys should start with "model.visual.*"
bad_visual = [k for k in keys if k.startswith("model.language_model.visual.")]
assert len(bad_visual) == 0, f"Incorrect visual keys found:\n" + "\n".join(bad_visual[:3])
RAW_BUFFERClick to expand / collapse

System Info

  • transformers version: 5.5.0, 5.5.3
  • Platform: Linux (NVIDIA A100 80GB × 8)
  • Python version: 3.12
  • PyTorch version: 2.9.1+cu128
  • CUDA version: 12.8

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Description

save_pretrained on Qwen3_5ForConditionalGeneration produces incorrect weight keys for the visual encoder. This issue was partially addressed in v5.5.2 (#45340), which fixed the text model key regression. However, the visual encoder keys are still saved with a wrong prefix in v5.5.3.

This makes it impossible to reload the saved checkpoint without manual key remapping.

Reproduction

import transformers
from safetensors.torch import load_file

model = transformers.Qwen3_5ForConditionalGeneration.from_pretrained("Qwen/Qwen3.5-0.8B")
model.save_pretrained("./qwen35-saved")

tensors = load_file("./qwen35-saved/model.safetensors")
keys = list(tensors.keys())

# Visual encoder keys should start with "model.visual.*"
bad_visual = [k for k in keys if k.startswith("model.language_model.visual.")]
assert len(bad_visual) == 0, f"Incorrect visual keys found:\n" + "\n".join(bad_visual[:3])

Expected behavior

Expected vs Actual Behavior

Comparing keys from the original model weights (reference) vs. checkpoint saved by save_pretrained:

ModuleOriginal model (reference)Saved by 5.5.0Saved by 5.5.3
Text layersmodel.language_model.layers.*model.language_model.language_model.language_model.layers.*model.language_model.layers.*
Visual encodermodel.visual.*model.language_model.visual.*model.language_model.visual.*
  • v5.5.0: Both text and visual keys are broken.
  • v5.5.3: Text keys are fixed (by #45340), but visual encoder keys remain incorrectmodel.language_model.visual.* instead of model.visual.*.

Impact

  • Checkpoints saved after fine-tuning cannot be loaded by from_pretrained without manual key conversion.
  • Inference frameworks (vLLM, lmdeploy, etc.) fail to load fine-tuned Qwen3.5 checkpoints.

Related

  • #45216 — Original report of the save_pretrained regression in 5.4.0
  • #45340 — PR that fixed the text model key regression (but not the visual encoder)

extent analysis

TL;DR

The most likely fix for the incorrect weight keys in the visual encoder when using save_pretrained on Qwen3_5ForConditionalGeneration is to manually remap the keys or wait for a future version of the transformers library that addresses this issue.

Guidance

  • Verify that the issue is indeed caused by the incorrect prefix in the visual encoder keys by checking the saved model's keys against the expected keys.
  • Consider manually remapping the keys to the correct prefix (model.visual.*) when loading the saved checkpoint.
  • Keep an eye on future updates to the transformers library, as this issue may be addressed in a future version.
  • If possible, test the model with a different version of the transformers library to see if the issue is specific to version 5.5.3.

Example

# Manually remap the keys to the correct prefix
tensors = load_file("./qwen35-saved/model.safetensors")
keys = list(tensors.keys())
remapped_tensors = {}
for key in keys:
    if key.startswith("model.language_model.visual."):
        remapped_key = key.replace("model.language_model.visual.", "model.visual.")
        remapped_tensors[remapped_key] = tensors[key]
    else:
        remapped_tensors[key] = tensors[key]

Notes

The issue is specific to the Qwen3_5ForConditionalGeneration model and the save_pretrained method in version 5.5.3 of the transformers library. The fix may not be applicable to other models or versions.

Recommendation

Apply a workaround, such as manual key remapping, until a future version of the transformers library addresses this issue. This is because the current version (5.5.3) still has the incorrect prefix for the visual encoder keys.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - ✅(Solved) Fix [Regression] Qwen3.5 `save_pretrained` still saves incorrect visual encoder keys in 5.5.3 [2 pull requests, 1 comments, 2 participants]