pytorch - ✅(Solved) Fix cpu and gpu mismatch [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#180154Fetched 2026-04-12 13:23:38
View on GitHub
Comments
1
Participants
2
Timeline
17
Reactions
0
Timeline (top)
mentioned ×6subscribed ×6labeled ×4commented ×1

Error Message

GPU/CPU error ratio: 91986x

PR fix notes

PR #3339: Improve long float32 torch.cumprod numerical parity on XPU

Description (problem / solution / changelog)

  • Investigated CI failure (clang-format lint error in ScanKernels.cpp)
  • Fixed if-condition line break to match clang-format expectations

Changed files

  • src/ATen/native/xpu/ScanKernels.cpp (modified, +14/-3)
  • test/xpu/test_torch_xpu.py (modified, +15/-0)

Code Example

import torch
import numpy as np

torch.manual_seed(0)
n = 1_000_000
x = 1.0 + 0.0001 * torch.randn(n, dtype=torch.float32)

ref = np.cumprod(x.numpy().astype(np.float64))
cpu = torch.cumprod(x, dim=0).numpy()
gpu = torch.cumprod(x.cuda(), dim=0).cpu().numpy()

cpu_err = float(np.max(np.abs(cpu.astype(np.float64) - ref)))
gpu_err = float(np.max(np.abs(gpu.astype(np.float64) - ref)))
ratio = gpu_err / cpu_err if cpu_err > 0 else float('inf')

print(f"cpu_max_err = {cpu_err:.4e}  (CPU promotes f32→f64 internally)")
print(f"gpu_max_err = {gpu_err:.4e}  (GPU accumulates in f32)")
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

import torch
import numpy as np

torch.manual_seed(0)
n = 1_000_000
x = 1.0 + 0.0001 * torch.randn(n, dtype=torch.float32)

ref = np.cumprod(x.numpy().astype(np.float64))
cpu = torch.cumprod(x, dim=0).numpy()
gpu = torch.cumprod(x.cuda(), dim=0).cpu().numpy()

cpu_err = float(np.max(np.abs(cpu.astype(np.float64) - ref)))
gpu_err = float(np.max(np.abs(gpu.astype(np.float64) - ref)))
ratio = gpu_err / cpu_err if cpu_err > 0 else float('inf')

print(f"cpu_max_err = {cpu_err:.4e}  (CPU promotes f32→f64 internally)")
print(f"gpu_max_err = {gpu_err:.4e}  (GPU accumulates in f32)")

Versions

cpu_max_err = 5.9604e-08
gpu_max_err = 5.4827e-03
GPU/CPU error ratio: 91986x

cc @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia

extent analysis

TL;DR

The issue can be mitigated by promoting the input data type to torch.float64 to match the internal promotion on the CPU.

Guidance

  • The large error ratio between GPU and CPU calculations suggests a precision issue, likely due to the GPU accumulating in float32 while the CPU promotes to float64 internally.
  • To verify the cause, compare the results of torch.cumprod on the CPU with and without explicit type promotion to float64.
  • Consider promoting the input x to torch.float64 before calculating torch.cumprod on the GPU to reduce the error.
  • Monitor the error ratio after applying the type promotion to ensure it decreases significantly.

Example

x = 1.0 + 0.0001 * torch.randn(n, dtype=torch.float64)
gpu = torch.cumprod(x.cuda(), dim=0).cpu().numpy()

Notes

The provided code snippet does not account for potential performance differences between float32 and float64 operations on the GPU.

Recommendation

Apply workaround: Promote the input data type to torch.float64 to reduce the precision error on the GPU, as this directly addresses the observed issue without requiring version upgrades or other changes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING