pytorch - 💡(How to fix) Fix complex64 cumsum GPU 10–16x less accurate [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#180152Fetched 2026-04-12 13:23:41
View on GitHub
Comments
0
Participants
1
Timeline
126
Reactions
0
Participants
Timeline (top)
mentioned ×60subscribed ×60labeled ×6

Error Message

print(f"GPU/CPU error ratio: {ratio:.1f}x (GPU less accurate)")

Code Example

import torch
import numpy as np

torch.manual_seed(0)
n = 5_000_000
x = (torch.randn(n, dtype=torch.float32) + 1j * torch.randn(n, dtype=torch.float32)).to(torch.complex64)

ref = np.cumsum(x.numpy().astype(np.complex128))
cpu = torch.cumsum(x, dim=0).numpy()
gpu = torch.cumsum(x.cuda(), dim=0).cpu().numpy()

cpu_err = float(np.max(np.abs(cpu.astype(np.complex128) - ref)))
gpu_err = float(np.max(np.abs(gpu.astype(np.complex128) - ref)))
ratio = gpu_err / cpu_err if cpu_err > 0 else float('inf')

print(f"cpu_max_err = {cpu_err:.4e}")
print(f"gpu_max_err = {gpu_err:.4e}")
print(f"GPU/CPU error ratio: {ratio:.1f}x  (GPU less accurate)")
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

import torch
import numpy as np

torch.manual_seed(0)
n = 5_000_000
x = (torch.randn(n, dtype=torch.float32) + 1j * torch.randn(n, dtype=torch.float32)).to(torch.complex64)

ref = np.cumsum(x.numpy().astype(np.complex128))
cpu = torch.cumsum(x, dim=0).numpy()
gpu = torch.cumsum(x.cuda(), dim=0).cpu().numpy()

cpu_err = float(np.max(np.abs(cpu.astype(np.complex128) - ref)))
gpu_err = float(np.max(np.abs(gpu.astype(np.complex128) - ref)))
ratio = gpu_err / cpu_err if cpu_err > 0 else float('inf')

print(f"cpu_max_err = {cpu_err:.4e}")
print(f"gpu_max_err = {gpu_err:.4e}")
print(f"GPU/CPU error ratio: {ratio:.1f}x  (GPU less accurate)")

Versions

version 2.9.0 cpu_max_err = 1.3636e-04 gpu_max_err = 1.5652e-03

cc @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia @ezyang @anjali411 @dylanbespalko @mruberry @nikitaved @amjames

extent analysis

TL;DR

The issue can be mitigated by using a higher precision data type, such as torch.complex128, to reduce the numerical error in the cumulative sum calculation.

Guidance

  • The error ratio between GPU and CPU calculations suggests a potential issue with numerical precision on the GPU.
  • Using torch.complex64 may be causing the error due to its limited precision.
  • Consider using torch.complex128 to increase the precision of the calculations.
  • Verify the results by comparing the error ratios with the new data type.

Example

x = (torch.randn(n, dtype=torch.float64) + 1j * torch.randn(n, dtype=torch.float64)).to(torch.complex128)

Notes

The issue may be specific to the version 2.9.0 of the library, and the fix may not be applicable to other versions.

Recommendation

Apply workaround: using torch.complex128 instead of torch.complex64 to increase the precision of the calculations, as this is a simple and non-invasive change that can help mitigate the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING