transformers - ✅(Solved) Fix [BUG] transformers>=5.4.0, Qwen3.5 Moe from_pretrained error [2 pull requests, 1 comments, 2 participants]

Jintao-Huang · 2026-04-08T09:29:48Z

[transformers] PR 45314: Conversion for LLM class loading with VLM ckpt - Repository: huggingface/transformers - Author: zucchini-nlp - State: open | merged: F… # PR #45314: Conversion for LLM class loading with VLM ckpt - Repository: huggingface/transformers - Author: zucchini-nlp - State: open | merged: False - Link: https://github.com/huggingface/transformers/pull/45314 ## Description (problem / solution / changelog) # What does this PR do? fixes https://github.com/huggingface/transformers/issues/45216 and https://github.com/huggingface/transformers/issues/45310 and https://github.com/huggingface/transformers/issues/45313 TBH load-save-load works for the model on main branch which is why the tests are not failing, it is only that the saved sd is completely weird and incorrect. Also smth when deepspeed loading, but I didn't check really This works when `from_pretrained` just because we replace all matches with `original_key.replace` thus the whole `language_model.language_model.language_model` part is replaced ## Changed files - `src/transformers/conversion_mapping.py` (modified, +12/-8) - `src/transformers/models/gemma3n/modeling_gemma3n.py` (modified, +1/-0) - `src/transformers/models/gemma3n/modular_gemma3n.py` (modified, +1/-1) - `src/transformers/models/qwen3_5/modeling_qwen3_5.py` (modified, +1/-0) - `src/transformers/models/qwen3_5/modular_qwen3_5.py` (modified, +1/-0) - `src/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py` (modified, +1/-0) - `src/transformers/models/qwen3_5_moe/modular_qwen3_5_moe.py` (modified, +1/-0) - `tests/models/gemma3n/test_modeling_gemma3n.py` (modified, +0/-6) - `tests/models/qwen3_5/test_modeling_qwen3_5.py` (modified, +0/-6) - `tests/models/qwen3_5_moe/test_modeling_qwen3_5_moe.py` (modified, +0/-6) --- # PR #20: Revert qwen3.5 save weight - Repository: modelscope/mcore-bridge - Author: Jintao-Huang - State: closed | merged: True - Link: https://github.com/modelscope/mcore-bridge/pull/20 ## Description (problem / solution / changelog) https://github.com/huggingface/transformers/issues/45310 ## Changed files - `src/mcore_bridge/bridge/gpt_bridge.py` (modified, +0/-2) ### System Info https://github.com/huggingface/transformers/issues/45310 This issue has not been fixed in the main branch. ``` import os os.environ['CUDA_VISIBLE_DEVICS'] = '0' from transformers import Qwen3_5ForConditionalGeneration, AutoTokenizer model = Qwen3_5ForConditionalGeneration.from_pretrained('Qwen/Qwen3.5-35B-A3B') model.save_pretrained('/root/Qwen3.5-35B-A3B', max_shard_size='10GB') model = Qwen3_5ForConditionalGeneration.from_pretrained('/root/Qwen3.5-35B-A3B') ``` ### Who can help? - ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below) ### Reproduction - ### Expected behavior -

transformers2026-04-08 09:29:48

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45310•Fetched 2026-04-09 07:50:49

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Jintao-Huang

Participants

Jintao-Huang

zucchini-nlp

Timeline (top)

cross-referenced ×3commented ×1labeled ×1

PR fix notes

PR #45314: Conversion for LLM class loading with VLM ckpt

Repository: huggingface/transformers
Author: zucchini-nlp
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/45314

Description (problem / solution / changelog)

What does this PR do?

fixes https://github.com/huggingface/transformers/issues/45216 and https://github.com/huggingface/transformers/issues/45310 and https://github.com/huggingface/transformers/issues/45313

TBH load-save-load works for the model on main branch which is why the tests are not failing, it is only that the saved sd is completely weird and incorrect. Also smth when deepspeed loading, but I didn't check really

This works when from_pretrained just because we replace all matches with original_key.replace thus the whole language_model.language_model.language_model part is replaced

Changed files

src/transformers/conversion_mapping.py (modified, +12/-8)
src/transformers/models/gemma3n/modeling_gemma3n.py (modified, +1/-0)
src/transformers/models/gemma3n/modular_gemma3n.py (modified, +1/-1)
src/transformers/models/qwen3_5/modeling_qwen3_5.py (modified, +1/-0)
src/transformers/models/qwen3_5/modular_qwen3_5.py (modified, +1/-0)
src/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py (modified, +1/-0)
src/transformers/models/qwen3_5_moe/modular_qwen3_5_moe.py (modified, +1/-0)
tests/models/gemma3n/test_modeling_gemma3n.py (modified, +0/-6)
tests/models/qwen3_5/test_modeling_qwen3_5.py (modified, +0/-6)
tests/models/qwen3_5_moe/test_modeling_qwen3_5_moe.py (modified, +0/-6)

PR #20: Revert qwen3.5 save weight

Repository: modelscope/mcore-bridge
Author: Jintao-Huang
State: closed | merged: True
Link: https://github.com/modelscope/mcore-bridge/pull/20

Description (problem / solution / changelog)

https://github.com/huggingface/transformers/issues/45310

Changed files

src/mcore_bridge/bridge/gpt_bridge.py (modified, +0/-2)

Code Example

import os
os.environ['CUDA_VISIBLE_DEVICS'] = '0'

from transformers import Qwen3_5ForConditionalGeneration, AutoTokenizer

model = Qwen3_5ForConditionalGeneration.from_pretrained('Qwen/Qwen3.5-35B-A3B')
model.save_pretrained('/root/Qwen3.5-35B-A3B', max_shard_size='10GB')
model = Qwen3_5ForConditionalGeneration.from_pretrained('/root/Qwen3.5-35B-A3B')

RAW_BUFFERClick to expand / collapse

System Info

https://github.com/huggingface/transformers/issues/45310

This issue has not been fixed in the main branch.

import os
os.environ['CUDA_VISIBLE_DEVICS'] = '0'

from transformers import Qwen3_5ForConditionalGeneration, AutoTokenizer

model = Qwen3_5ForConditionalGeneration.from_pretrained('Qwen/Qwen3.5-35B-A3B')
model.save_pretrained('/root/Qwen3.5-35B-A3B', max_shard_size='10GB')
model = Qwen3_5ForConditionalGeneration.from_pretrained('/root/Qwen3.5-35B-A3B')

Who can help?

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Expected behavior

extent analysis

TL;DR

The issue is likely due to a typo in the environment variable name, which should be 'CUDA_VISIBLE_DEVICES' instead of 'CUDA_VISIBLE_DEVICS'.

Guidance

Check the spelling of environment variable names to ensure they match the expected names, such as 'CUDA_VISIBLE_DEVICES'.
Verify that the correct GPU device is being used by printing the value of os.environ['CUDA_VISIBLE_DEVICES'] after setting it.
Ensure that the GPU device specified in the environment variable is available and properly configured on the system.
Review the official documentation for the transformers library to ensure that the model is being loaded and saved correctly.

Example

import os
print(os.environ.get('CUDA_VISIBLE_DEVICES'))  # Check the current value
os.environ['CUDA_VISIBLE_DEVICES'] = '0'  # Set the correct value
print(os.environ.get('CUDA_VISIBLE_DEVICES'))  # Verify the new value

Notes

The provided code snippet seems to be incomplete, and there is no clear information about the expected behavior or the actual error message. Therefore, the guidance is limited to verifying the environment variable name and ensuring the GPU device is properly configured.

Recommendation

Apply workaround: Correct the typo in the environment variable name to 'CUDA_VISIBLE_DEVICES' to ensure the GPU device is properly utilized.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#device allocation #model download #tokenizer error #prompt formatting #chain error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - ✅(Solved) Fix [BUG] transformers>=5.4.0, Qwen3.5 Moe from_pretrained error [2 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #45314: Conversion for LLM class loading with VLM ckpt

Description (problem / solution / changelog)

What does this PR do?

Changed files

PR #20: Revert qwen3.5 save weight

Description (problem / solution / changelog)

Changed files

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

transformers - ✅(Solved) Fix [BUG] transformers>=5.4.0, Qwen3.5 Moe from_pretrained error [2 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #45314: Conversion for LLM class loading with VLM ckpt

Description (problem / solution / changelog)

What does this PR do?

Changed files

PR #20: Revert qwen3.5 save weight

Description (problem / solution / changelog)

Changed files

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING