pytorch - 💡(How to fix) Fix Request to support fp8 x4 packed dtypes in PT [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#179109Fetched 2026-04-08 02:21:54
View on GitHub
Comments
1
Participants
2
Timeline
23
Reactions
0
Timeline (top)
mentioned ×10subscribed ×10commented ×1labeled ×1
RAW_BUFFERClick to expand / collapse

Trainium3 hardware natively operates on packed FP8 dtypes: float8_e4m3fn_x4 and float8_e5m2_x4, which bundle 4 FP8 elements into 1 logical element along the free dimension.

Example: a float8_e4m3fn tensor of shape [64, 128] becomes a float8_e4m3fn_x4 tensor of shape [64, 32] given -1 is the free dim.

Currently, we handle this packing/unpacking internally in our kernels, but we'd like to propose adding these two x4 dtypes to PT Core.

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @Xia-Weiwen @leslie-fang-intel

extent analysis

TL;DR

Adding support for float8_e4m3fn_x4 and float8_e5m2_x4 dtypes to PT Core may require modifying the existing kernel handling to accommodate the packed FP8 elements.

Guidance

  • Investigate the current kernel implementation to understand how it handles packing and unpacking of FP8 elements.
  • Determine the requirements for adding the new dtypes to PT Core, including any necessary changes to the kernel or other components.
  • Consider the impact of adding these dtypes on existing functionality and performance.
  • Evaluate the need for additional testing or validation to ensure correct handling of the new dtypes.

Notes

The proposal to add new dtypes to PT Core may involve significant changes to the underlying implementation, and careful consideration of the potential impact is necessary.

Recommendation

Apply workaround: Modify the existing kernel handling to accommodate the packed FP8 elements, as adding new dtypes to PT Core may require significant changes and testing.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING