pytorch - 💡(How to fix) Fix Large numerical discrepancy in torch.renorm between CPU and CUDA

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Code Example

import torch
import torch.nn as nn

torch.manual_seed(0)

fc1 = nn.Linear(8, 8)
fc2 = nn.Linear(8, 8)

def forward(model_fc1, model_fc2, device):
    x = torch.randn(4, 8, device=device)
    y = model_fc1(x)
    z = torch.sin(y) * torch.cos(y)
    w = torch.log1p(z.abs())
    t = torch.renorm(w, p=2, dim=0, maxnorm=10.0)
    s = torch.sin(t)
    return model_fc2(s.detach())

# CPU
cpu_out = forward(fc1, fc2, 'cpu')

# GPU
fc1_g = nn.Linear(8, 8).cuda(); fc1_g.load_state_dict(fc1.state_dict())
fc2_g = nn.Linear(8, 8).cuda(); fc2_g.load_state_dict(fc2.state_dict())

gpu_out = forward(fc1_g, fc2_g, 'cuda')

diff = (cpu_out - gpu_out.cpu()).abs()
print("max diff:", diff.max())
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

import torch
import torch.nn as nn

torch.manual_seed(0)

fc1 = nn.Linear(8, 8)
fc2 = nn.Linear(8, 8)

def forward(model_fc1, model_fc2, device):
    x = torch.randn(4, 8, device=device)
    y = model_fc1(x)
    z = torch.sin(y) * torch.cos(y)
    w = torch.log1p(z.abs())
    t = torch.renorm(w, p=2, dim=0, maxnorm=10.0)
    s = torch.sin(t)
    return model_fc2(s.detach())

# CPU
cpu_out = forward(fc1, fc2, 'cpu')

# GPU
fc1_g = nn.Linear(8, 8).cuda(); fc1_g.load_state_dict(fc1.state_dict())
fc2_g = nn.Linear(8, 8).cuda(); fc2_g.load_state_dict(fc2.state_dict())

gpu_out = forward(fc1_g, fc2_g, 'cuda')

diff = (cpu_out - gpu_out.cpu()).abs()
print("max diff:", diff.max())

Versions

2.9.1+cu128 (PyTorch 2.9.1, CUDA 12.8)

While small numerical differences between CPU and CUDA are expected, the magnitude of discrepancy here (~1e-1 after a simple pipeline) appears unusually large.

cc @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia

extent analysis

TL;DR

The large numerical discrepancy between CPU and CUDA outputs may be due to differences in the implementation of certain mathematical operations, and verifying the issue with a simpler pipeline or checking for CUDA version compatibility could help identify the root cause.

Guidance

  • Verify if the issue persists with a simpler pipeline, such as removing the torch.sin, torch.cos, and torch.log1p operations, to isolate the source of the discrepancy.
  • Check the CUDA version compatibility with PyTorch 2.9.1, as the issue might be related to the specific CUDA version used (12.8).
  • Compare the results with other CUDA versions or PyTorch versions to see if the issue is specific to this combination.
  • Consider using torch.backends.cudnn.deterministic = True and torch.backends.cudnn.benchmark = False to ensure deterministic behavior on the GPU.

Example

No specific code snippet is provided as the issue is more related to the numerical discrepancy between CPU and CUDA outputs rather than a specific code error.

Notes

The issue might be related to the specific mathematical operations used in the pipeline, and further investigation is needed to determine the root cause. Additionally, the use of detach() in the forward function might not be necessary and could potentially affect the results.

Recommendation

Apply workaround: Use a simpler pipeline to verify the issue and check CUDA version compatibility, as the root cause of the discrepancy is not immediately clear.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix Large numerical discrepancy in torch.renorm between CPU and CUDA