transformers - 💡(How to fix) Fix DeepSeek-V4 FP8 load fails: 1→1 WeightConverter w2 weights not dequantized due to $-anchored source pattern [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fixed

RAW_BUFFERClick to expand / collapse

System Info

In src/transformers/quantizers/quantizer_finegrained_fp8.py, strip $ before doing string operations on source_patterns:

Line 211

weight_sources = [p for p in conv.source_patterns if p.rstrip("$").endswith(".weight")]

Line 213

anchored_weight = [p.rstrip("$") + "$" for p in weight_sources]

Line 214

scale_sources = [p.rstrip("$")[:-len(".weight")] + ".weight_scale_inv$" for p in weight_sources]

Line 215

other = [p for p in conv.source_patterns if not p.rstrip("$").endswith(".weight")] This makes the FP8 quantizer robust to $-anchored source patterns regardless of whether the original mapping author chose to anchor the target pattern.

Long-term considerations (for maintainers) Interface clarity: WeightTransform.source_patterns is internally mutated by init for reverse-mapping correctness. External consumers (like quantizers) currently have no way to know whether they're looking at "raw user input" or "post-processed regex". Consider exposing the distinction explicitly (e.g. raw_source_patterns vs compiled_source_patterns). Consistency in mappings: V4's mapping uses $ anchors while other MoE mappings (Mixtral, Qwen2MoE, PhiMoE, etc.) do not. It would be worth either standardizing all mappings to use $ (and then fixing all downstream consumers), or removing $ from V4 to align with the convention. Environment transformers version: main @ ca9333c and earlier Python version: (please fill in) PyTorch version: (please fill in) Platform: (please fill in) Reproducible on: DeepSeek-V4-Flash FP8 checkpoint with dequantize=True Are you willing to submit a PR?

Who can help?

transformers/tests/models/deepseek_v4/test_modeling_deepseek_v4.py

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

transformers/tests/models/deepseek_v4/test_modeling_deepseek_v4.py

Expected behavior

succeed with transformers/tests/models/deepseek_v4/test_modeling_deepseek_v4.py

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

succeed with transformers/tests/models/deepseek_v4/test_modeling_deepseek_v4.py

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING