pytorch - 💡(How to fix) Fix The smallest values of E4M3 for two-level quantization is incorrect [5 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#179111Fetched 2026-04-08 02:21:45
View on GitHub
Comments
5
Participants
2
Timeline
29
Reactions
0
Timeline (top)
mentioned ×11subscribed ×11commented ×5labeled ×2

Code Example

E4M3_EPS = torch.finfo(torch.float8_e4m3fn).tiny
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

In ao/torchao/prototype/mx_formats/nvfp4_tensor.py, the smallest E4M3 for two-level quantization is given by

E4M3_EPS = torch.finfo(torch.float8_e4m3fn).tiny

which is the smallest positive normal number of E4M3, i.e., $$2^{1-7}=0.015625,$$ However, the smallest positive number of E4M3 is $$2^{-9}=0.001953125$$ which is a subnormal. Note that a sufficient small scale factor with type of E4M3 can be a subnormal number .

Versions

PyTorch version: 2.8.0+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 企业版 (10.0.26100 64 位) GCC version: Could not collect Clang version: Could not collect CMake version: Could not collect Libc version: N/A

Python version: 3.12.1 (tags/v3.12.1:2305ca5, Dec 7 2023, 22:03:25) [MSC v.1937 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-11-10.0.26100-SP0 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Caching allocator config: N/A

CPU: Name: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz Manufacturer: GenuineIntel Family: 198 Architecture: 9 ProcessorType: 3 DeviceID: CPU0 CurrentClockSpeed: 2803 MaxClockSpeed: 2803 L2CacheSize: 5120 L2CacheSpeed: None Revision: None

Versions of relevant libraries: [pip3] numpy==2.3.2 [pip3] torch==2.8.0 [pip3] torchaudio==2.8.0 [pip3] torchvision==0.23.0 [conda] Could not collect

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @Xia-Weiwen @leslie-fang-intel

extent analysis

TL;DR

The issue can be fixed by using the smallest positive subnormal number of E4M3 instead of the smallest positive normal number.

Guidance

  • The current code uses torch.finfo(torch.float8_e4m3fn).tiny to get the smallest positive normal number of E4M3, which is incorrect.
  • The smallest positive number of E4M3 is a subnormal number, which is $2^{-9}=0.001953125$.
  • To fix the issue, the code should be updated to use the smallest positive subnormal number of E4M3.
  • The correct value can be calculated manually or using a library function that supports subnormal numbers.

Example

E4M3_EPS = 2**-9

This code snippet calculates the smallest positive subnormal number of E4M3.

Notes

The issue is specific to the E4M3 format and the use of subnormal numbers. The fix may not be applicable to other formats or use cases.

Recommendation

Apply workaround: the code should be updated to use the smallest positive subnormal number of E4M3, as calculated manually or using a library function. This is because the current implementation uses the smallest positive normal number, which is incorrect for E4M3.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING