transformers - 💡(How to fix) Fix [Bug] FP8 save_pretrained moe [3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#44222Fetched 2026-04-08 00:29:44
View on GitHub
Comments
3
Participants
3
Timeline
9
Reactions
0
Timeline (top)
commented ×3mentioned ×2subscribed ×2cross-referenced ×1

Code Example

from transformers import AutoModelForCausalLM, AutoTokenizer, FineGrainedFP8Config

model_name = "Qwen/Qwen3-30B-A3B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    quantization_config=FineGrainedFP8Config(modules_to_not_convert=['lm_head']),
)

model.save_pretrained('/root/test', safe_serialization=True, max_shard_size='5GB')
RAW_BUFFERClick to expand / collapse

System Info

Who can help?

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import AutoModelForCausalLM, AutoTokenizer, FineGrainedFP8Config

model_name = "Qwen/Qwen3-30B-A3B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    quantization_config=FineGrainedFP8Config(modules_to_not_convert=['lm_head']),
)

model.save_pretrained('/root/test', safe_serialization=True, max_shard_size='5GB')
<img width="543" height="303" alt="Image" src="https://github.com/user-attachments/assets/7cf5874f-5aeb-45d7-99e6-127d2e093909" />

Expected behavior

extent analysis

Fix Plan

1. Update transformers library to the latest version

Ensure you're running the latest version of the transformers library, which may have fixed the issue.

pip install --upgrade transformers

2. Use save_pretrained with use_cache=False

The save_pretrained method can sometimes cause issues when saving large models. Try setting use_cache=False to see if it resolves the problem.

model.save_pretrained('/root/test', safe_serialization=True, max_shard_size='5GB', use_cache=False)

3. Check for Out-of-Memory (OOM) errors

If the model is too large, it may cause OOM errors when saving. Try reducing the max_shard_size or using a more efficient storage solution.

model.save_pretrained('/root/test', safe_serialization=True, max_shard_size='2GB')

4. Verify the issue is resolved

Run the script again and check if the issue persists.

# Run the script with the updated parameters
python script.py

Verification

  • Run the script with the updated parameters and check if the issue is resolved.
  • Verify that the model is saved successfully without any errors.

Extra Tips

  • Always use the latest version of the transformers library to ensure you have the latest bug fixes and features.
  • Be cautious when working with large models, as they can cause Out-of-Memory (OOM) errors.
  • Consider using a more efficient storage solution, such as torch.save or joblib, to save large models.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - 💡(How to fix) Fix [Bug] FP8 save_pretrained moe [3 comments, 3 participants]