transformers - ✅(Solved) Fix [BUG] transformers>=5.4.0, Qwen3.5 Moe from_pretrained error [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#45310Fetched 2026-04-09 07:50:49
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
0
Timeline (top)
cross-referenced ×3commented ×1labeled ×1

PR fix notes

PR #45314: Conversion for LLM class loading with VLM ckpt

Description (problem / solution / changelog)

What does this PR do?

fixes https://github.com/huggingface/transformers/issues/45216 and https://github.com/huggingface/transformers/issues/45310 and https://github.com/huggingface/transformers/issues/45313

TBH load-save-load works for the model on main branch which is why the tests are not failing, it is only that the saved sd is completely weird and incorrect. Also smth when deepspeed loading, but I didn't check really

This works when from_pretrained just because we replace all matches with original_key.replace thus the whole language_model.language_model.language_model part is replaced

Changed files

  • src/transformers/conversion_mapping.py (modified, +12/-8)
  • src/transformers/models/gemma3n/modeling_gemma3n.py (modified, +1/-0)
  • src/transformers/models/gemma3n/modular_gemma3n.py (modified, +1/-1)
  • src/transformers/models/qwen3_5/modeling_qwen3_5.py (modified, +1/-0)
  • src/transformers/models/qwen3_5/modular_qwen3_5.py (modified, +1/-0)
  • src/transformers/models/qwen3_5_moe/modeling_qwen3_5_moe.py (modified, +1/-0)
  • src/transformers/models/qwen3_5_moe/modular_qwen3_5_moe.py (modified, +1/-0)
  • tests/models/gemma3n/test_modeling_gemma3n.py (modified, +0/-6)
  • tests/models/qwen3_5/test_modeling_qwen3_5.py (modified, +0/-6)
  • tests/models/qwen3_5_moe/test_modeling_qwen3_5_moe.py (modified, +0/-6)

PR #20: Revert qwen3.5 save weight

Description (problem / solution / changelog)

https://github.com/huggingface/transformers/issues/45310

Changed files

  • src/mcore_bridge/bridge/gpt_bridge.py (modified, +0/-2)

Code Example

import os
os.environ['CUDA_VISIBLE_DEVICS'] = '0'

from transformers import Qwen3_5ForConditionalGeneration, AutoTokenizer

model = Qwen3_5ForConditionalGeneration.from_pretrained('Qwen/Qwen3.5-35B-A3B')
model.save_pretrained('/root/Qwen3.5-35B-A3B', max_shard_size='10GB')
model = Qwen3_5ForConditionalGeneration.from_pretrained('/root/Qwen3.5-35B-A3B')
RAW_BUFFERClick to expand / collapse

System Info

https://github.com/huggingface/transformers/issues/45310

This issue has not been fixed in the main branch.

import os
os.environ['CUDA_VISIBLE_DEVICS'] = '0'

from transformers import Qwen3_5ForConditionalGeneration, AutoTokenizer

model = Qwen3_5ForConditionalGeneration.from_pretrained('Qwen/Qwen3.5-35B-A3B')
model.save_pretrained('/root/Qwen3.5-35B-A3B', max_shard_size='10GB')
model = Qwen3_5ForConditionalGeneration.from_pretrained('/root/Qwen3.5-35B-A3B')

Who can help?

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Expected behavior

extent analysis

TL;DR

The issue is likely due to a typo in the environment variable name, which should be 'CUDA_VISIBLE_DEVICES' instead of 'CUDA_VISIBLE_DEVICS'.

Guidance

  • Check the spelling of environment variable names to ensure they match the expected names, such as 'CUDA_VISIBLE_DEVICES'.
  • Verify that the correct GPU device is being used by printing the value of os.environ['CUDA_VISIBLE_DEVICES'] after setting it.
  • Ensure that the GPU device specified in the environment variable is available and properly configured on the system.
  • Review the official documentation for the transformers library to ensure that the model is being loaded and saved correctly.

Example

import os
print(os.environ.get('CUDA_VISIBLE_DEVICES'))  # Check the current value
os.environ['CUDA_VISIBLE_DEVICES'] = '0'  # Set the correct value
print(os.environ.get('CUDA_VISIBLE_DEVICES'))  # Verify the new value

Notes

The provided code snippet seems to be incomplete, and there is no clear information about the expected behavior or the actual error message. Therefore, the guidance is limited to verifying the environment variable name and ensuring the GPU device is properly configured.

Recommendation

Apply workaround: Correct the typo in the environment variable name to 'CUDA_VISIBLE_DEVICES' to ensure the GPU device is properly utilized.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING