transformers - 💡(How to fix) Fix [BUG] deepseek-v4 `comb.to(dtype).transpose(-1, -2)`

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
RAW_BUFFERClick to expand / collapse

System Info

Expected fix:comb.to(dtype).transpose(-1, -2) -> comb.to(dtype)

https://github.com/huggingface/transformers/blame/10555512868d663ee1ff627e4f5c5c260114235b/src/transformers/models/deepseek_v4/modular_deepseek_v4.py#L998-L999

https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/blob/main/inference/model.py#L685-L686

comb: [b,s,hc,hc], y: [b,s,hc,d]

torch.sum(comb.unsqueeze(-1) * residual.unsqueeze(-2), dim=2)

[b, s, hc, hc, 1] * [b, s, hc, 1, d] -[*]> [b, s, hc, hc, d] -[sum]-> [b, s, hc, d]

Who can help?

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Expected behavior

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING