transformers - ✅(Solved) Fix [BUG] gemma-4 zero3 from_pretrained [1 pull requests, 4 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#45397Fetched 2026-04-15 06:19:47
View on GitHub
Comments
4
Participants
2
Timeline
11
Reactions
0
Timeline (top)
commented ×4cross-referenced ×2subscribed ×2labeled ×1

Fix Action

Fixed

PR fix notes

PR #45402: Fix ZeRO-3 from_pretrained: load registered buffers in _load_state_dict_into_zero3_model

Description (problem / solution / changelog)

Fixes #45397

What does this PR do?

Fixes #45397

Root cause: _load_state_dict_into_zero3_model in src/transformers/integrations/deepspeed.py only iterates over named_parameters — never named_buffers. Buffers registered via register_buffer() are completely skipped during ZeRO-3 loading, causing them to always appear as MISSING.

Fix: After gathering and loading parameters, explicitly load buffers directly. Buffers don't need GatheredParameters since ZeRO-3 doesn't shard them.

Impact: Affects ANY model with registered buffers under ZeRO-3, not just Gemma-4. Gemma-4's Gemma4ClippableLinear uses buffers for clipping values (input_max, output_min, output_max).

Tested: Reproduced bug and verified fix on 2xRTX40 GPUs.

Changed files

  • src/transformers/integrations/deepspeed.py (modified, +10/-1)
RAW_BUFFERClick to expand / collapse

System Info

<img width="946" height="361" alt="Image" src="https://github.com/user-attachments/assets/193f1646-b90a-4d8b-a4ff-db8b252133ef" />

https://github.com/modelscope/ms-swift/issues/9078

google/gemma-4-E4B-it

zero2 works fine, zero3 does not.

zero2:

<img width="876" height="108" alt="Image" src="https://github.com/user-attachments/assets/ef9a0e24-c585-4f07-a388-ad8a66589fc9" />

zero3:

<img width="965" height="154" alt="Image" src="https://github.com/user-attachments/assets/7a4a02b7-f0f6-47f6-be53-5f738e213f9e" />

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Expected behavior

extent analysis

TL;DR

The issue may be related to differences in configuration or input between zero2 and zero3, and comparing these configurations could help identify the cause.

Guidance

  • Compare the input and configuration files used for zero2 and zero3 to identify any differences that could be causing the issue.
  • Check the official example scripts and documentation to ensure that the usage of zero3 is correct and supported.
  • Verify that the task or dataset being used is officially supported and compatible with zero3.
  • If using a custom task or dataset, try using an officially supported task to see if the issue persists.

Notes

The provided information is limited, and without more details about the specific configurations and tasks being used, it's difficult to provide a more specific solution.

Recommendation

Apply workaround: Compare and adjust configurations to match zero2 settings, as zero3 does not work as expected.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING