pytorch - ✅(Solved) Fix cpu and gpu mismatch [1 pull requests, 1 comments, 2 participants]

pytorch2026-04-12 00:19:11

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#180154•Fetched 2026-04-12 13:23:38

View on GitHub

Comments

Participants

Timeline

Reactions

Author

beanduan22

Participants

beanduan22

sakshar2303

Timeline (top)

mentioned ×6subscribed ×6labeled ×4commented ×1

Error Message

GPU/CPU error ratio: 91986x

PR fix notes

PR #3339: Improve long float32 `torch.cumprod` numerical parity on XPU

Repository: intel/torch-xpu-ops
Author: Copilot
State: open | merged: False
Link: https://github.com/intel/torch-xpu-ops/pull/3339

Description (problem / solution / changelog)

Investigated CI failure (clang-format lint error in ScanKernels.cpp)
Fixed if-condition line break to match clang-format expectations

Changed files

src/ATen/native/xpu/ScanKernels.cpp (modified, +14/-3)
test/xpu/test_torch_xpu.py (modified, +15/-0)

Code Example

import torch
import numpy as np

torch.manual_seed(0)
n = 1_000_000
x = 1.0 + 0.0001 * torch.randn(n, dtype=torch.float32)

ref = np.cumprod(x.numpy().astype(np.float64))
cpu = torch.cumprod(x, dim=0).numpy()
gpu = torch.cumprod(x.cuda(), dim=0).cpu().numpy()

cpu_err = float(np.max(np.abs(cpu.astype(np.float64) - ref)))
gpu_err = float(np.max(np.abs(gpu.astype(np.float64) - ref)))
ratio = gpu_err / cpu_err if cpu_err > 0 else float('inf')

print(f"cpu_max_err = {cpu_err:.4e}  (CPU promotes f32→f64 internally)")
print(f"gpu_max_err = {gpu_err:.4e}  (GPU accumulates in f32)")

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

import torch
import numpy as np

torch.manual_seed(0)
n = 1_000_000
x = 1.0 + 0.0001 * torch.randn(n, dtype=torch.float32)

ref = np.cumprod(x.numpy().astype(np.float64))
cpu = torch.cumprod(x, dim=0).numpy()
gpu = torch.cumprod(x.cuda(), dim=0).cpu().numpy()

cpu_err = float(np.max(np.abs(cpu.astype(np.float64) - ref)))
gpu_err = float(np.max(np.abs(gpu.astype(np.float64) - ref)))
ratio = gpu_err / cpu_err if cpu_err > 0 else float('inf')

print(f"cpu_max_err = {cpu_err:.4e}  (CPU promotes f32→f64 internally)")
print(f"gpu_max_err = {gpu_err:.4e}  (GPU accumulates in f32)")

Versions

cpu_max_err = 5.9604e-08
gpu_max_err = 5.4827e-03
GPU/CPU error ratio: 91986x

cc @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia

extent analysis

TL;DR

The issue can be mitigated by promoting the input data type to torch.float64 to match the internal promotion on the CPU.

Guidance

The large error ratio between GPU and CPU calculations suggests a precision issue, likely due to the GPU accumulating in float32 while the CPU promotes to float64 internally.
To verify the cause, compare the results of torch.cumprod on the CPU with and without explicit type promotion to float64.
Consider promoting the input x to torch.float64 before calculating torch.cumprod on the GPU to reduce the error.
Monitor the error ratio after applying the type promotion to ensure it decreases significantly.

Example

x = 1.0 + 0.0001 * torch.randn(n, dtype=torch.float64)
gpu = torch.cumprod(x.cuda(), dim=0).cpu().numpy()

Notes

The provided code snippet does not account for potential performance differences between float32 and float64 operations on the GPU.

Recommendation

Apply workaround: Promote the input data type to torch.float64 to reduce the precision error on the GPU, as this directly addresses the observed issue without requiring version upgrades or other changes.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #memory leak #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - ✅(Solved) Fix cpu and gpu mismatch [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

PR fix notes

PR #3339: Improve long float32 `torch.cumprod` numerical parity on XPU

Description (problem / solution / changelog)

Changed files

Code Example

🐛 Describe the bug

Versions

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

pytorch - ✅(Solved) Fix cpu and gpu mismatch [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

PR fix notes

PR #3339: Improve long float32 torch.cumprod numerical parity on XPU

Description (problem / solution / changelog)

Changed files

Code Example

🐛 Describe the bug

Versions

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

PR #3339: Improve long float32 `torch.cumprod` numerical parity on XPU