pytorch - 💡(How to fix) Fix Adding Dynamic Tanh (DyT) element wise Operation (Paper from Meta)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
RAW_BUFFERClick to expand / collapse

🚀 The feature, motivation and pitch

An important paper from Meta(FAIR), MIT, NYU and Princeton in late 2025 shows transformers can perform equally if not better using a Dynamic tanh operation instead of a Layer Normalization. The PR for this issue will aim to add this technique in the nn folder and norms file with minimal code. There is a successor paper to this as well featuring Dynamic erf (Derf) function for which i will open another issue.

I am attaching the related links here: Paper: https://arxiv.org/pdf/2503.10622 Title: Transformers without Normalization Code: https://github.com/jiachenzhu/DyT Website: https://jiachenzhu.github.io/DyT/

cc. @msaroufim @albanD @mruberry @jbschlosser @walterddr @mikaylagawarecki @malfet @zou3519 @jerryzh168 @cyyever @apaszke @anijain2305 @suo @janeyx99

Alternatives

No response

Additional context

<img width="732" height="490" alt="Image" src="https://github.com/user-attachments/assets/9f9d7c63-f158-4eb7-a3e2-288f8dc18399" /> <img width="1576" height="348" alt="Image" src="https://github.com/user-attachments/assets/a5df82be-046b-4071-9e8a-091e1b0fe2c3" />

cc @albanD @mruberry @jbschlosser @walterddr @mikaylagawarecki

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix Adding Dynamic Tanh (DyT) element wise Operation (Paper from Meta)