transformers - 💡(How to fix) Fix DeepSeek-V4 FP8 load fails: 1→1 WeightConverter w2 weights not dequantized due to $-anchored source pattern [1 pull requests]

transformers2026-05-27 11:02:33

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Fix Action

Fixed

Fixed by PR: fix: handle anchored FP8 weight source patterns (https://github.com/huggingface/transformers/pull/46248)

RAW_BUFFERClick to expand / collapse

System Info

In src/transformers/quantizers/quantizer_finegrained_fp8.py, strip $ before doing string operations on source_patterns:

Line 211

weight_sources = [p for p in conv.source_patterns if p.rstrip("$").endswith(".weight")]

Line 213

anchored_weight = [p.rstrip("$") + "$" for p in weight_sources]

Line 214

scale_sources = [p.rstrip("$")[:-len(".weight")] + ".weight_scale_inv$" for p in weight_sources]

Line 215

other = [p for p in conv.source_patterns if not p.rstrip("$").endswith(".weight")] This makes the FP8 quantizer robust to $-anchored source patterns regardless of whether the original mapping author chose to anchor the target pattern.

Long-term considerations (for maintainers) Interface clarity: WeightTransform.source_patterns is internally mutated by init for reverse-mapping correctness. External consumers (like quantizers) currently have no way to know whether they're looking at "raw user input" or "post-processed regex". Consider exposing the distinction explicitly (e.g. raw_source_patterns vs compiled_source_patterns). Consistency in mappings: V4's mapping uses $ anchors while other MoE mappings (Mixtral, Qwen2MoE, PhiMoE, etc.) do not. It would be worth either standardizing all mappings to use $ (and then fixing all downstream consumers), or removing $ from V4 to align with the convention. Environment transformers version: main @ ca9333c and earlier Python version: (please fill in) PyTorch version: (please fill in) Platform: (please fill in) Reproducible on: DeepSeek-V4-Flash FP8 checkpoint with dequantize=True Are you willing to submit a PR?

Who can help?

transformers/tests/models/deepseek_v4/test_modeling_deepseek_v4.py

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

transformers/tests/models/deepseek_v4/test_modeling_deepseek_v4.py

Expected behavior

succeed with transformers/tests/models/deepseek_v4/test_modeling_deepseek_v4.py

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

succeed with transformers/tests/models/deepseek_v4/test_modeling_deepseek_v4.py

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - 💡(How to fix) Fix DeepSeek-V4 FP8 load fails: 1→1 WeightConverter w2 weights not dequantized due to $-anchored source pattern [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

System Info

Line 211

Line 213

Line 214

Line 215

Who can help?

Information

Tasks

Reproduction

Expected behavior

FAQ

Expected behavior

Still need to ship something?

TRENDING