pytorch - 💡(How to fix) Fix Numerical divergence for multidimensional biases [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#178689Fetched 2026-04-08 01:45:06
View on GitHub
Comments
1
Participants
1
Timeline
51
Reactions
0
Participants
Timeline (top)
mentioned ×22subscribed ×22labeled ×4closed ×1

Code Example

import torch
import torch.nn.functional as F

torch.manual_seed(0)

mat1 = torch.randn(2, 3, device='cuda', dtype=torch.float16)
mat2 = torch.randn(3, 4, device='cuda', dtype=torch.float16)

bias_1d = torch.randn(4, device='cuda', dtype=torch.float16)
bias_2d = torch.randn(2, 4, device='cuda', dtype=torch.float16)

expected_1d = F.gelu(mat1 @ mat2 + bias_1d, approximate='tanh')
expected_2d = F.gelu(mat1 @ mat2 + bias_2d, approximate='tanh')

result_1d = torch._addmm_activation(bias_1d, mat1, mat2, beta=1.0, alpha=1.0, use_gelu=True)
result_2d = torch._addmm_activation(bias_2d, mat1, mat2, beta=1.0, alpha=1.0, use_gelu=True)

print("=== 1D bias case ===")
print("Expected:\n", expected_1d)
print("Result:\n", result_1d)
print("allclose: ", torch.allclose(expected_1d, result_1d, atol=1e-3))
print()

print("=== 2D bias case ===")
print("Expected (A@B + bias_2d then GELU):\n", expected_2d)
print("Result (torch._addmm_activation):\n", result_2d)
print("allclose: ", torch.allclose(expected_2d, result_2d, atol=1e-3))

---

=== 1D bias case ===
Expected:
 tensor([[-0.1279, -0.0049,  1.1494,  2.1348],
        [ 1.8838, -0.1118,  0.1242,  0.2185]], device='cuda:0',
       dtype=torch.float16)
Result:
 tensor([[-0.1279, -0.0049,  1.1494,  2.1348],
        [ 1.8838, -0.1116,  0.1242,  0.2185]], device='cuda:0',
       dtype=torch.float16)
allclose:  True

=== 2D bias case ===
Expected (A@B + bias_2d then GELU):
 tensor([[-1.3390e-02, -4.0340e-04,  7.7295e-01,  2.3301e+00],
        [-1.0999e-01,  1.5503e-02,  1.5808e-01,  4.6240e-01]], device='cuda:0',
       dtype=torch.float16)
Result (torch._addmm_activation):
 tensor([[-1.3390e-02, -4.0102e-04,  7.7246e-01,  2.3281e+00],
        [-1.0999e-01,  1.5587e-02,  1.5808e-01,  4.6289e-01]], device='cuda:0',
       dtype=torch.float16)
allclose:  False
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

The following script (tested with colab.research.google.com)

import torch
import torch.nn.functional as F

torch.manual_seed(0)

mat1 = torch.randn(2, 3, device='cuda', dtype=torch.float16)
mat2 = torch.randn(3, 4, device='cuda', dtype=torch.float16)

bias_1d = torch.randn(4, device='cuda', dtype=torch.float16)
bias_2d = torch.randn(2, 4, device='cuda', dtype=torch.float16)

expected_1d = F.gelu(mat1 @ mat2 + bias_1d, approximate='tanh')
expected_2d = F.gelu(mat1 @ mat2 + bias_2d, approximate='tanh')

result_1d = torch._addmm_activation(bias_1d, mat1, mat2, beta=1.0, alpha=1.0, use_gelu=True)
result_2d = torch._addmm_activation(bias_2d, mat1, mat2, beta=1.0, alpha=1.0, use_gelu=True)

print("=== 1D bias case ===")
print("Expected:\n", expected_1d)
print("Result:\n", result_1d)
print("allclose: ", torch.allclose(expected_1d, result_1d, atol=1e-3))
print()

print("=== 2D bias case ===")
print("Expected (A@B + bias_2d then GELU):\n", expected_2d)
print("Result (torch._addmm_activation):\n", result_2d)
print("allclose: ", torch.allclose(expected_2d, result_2d, atol=1e-3))

shows as a result

=== 1D bias case ===
Expected:
 tensor([[-0.1279, -0.0049,  1.1494,  2.1348],
        [ 1.8838, -0.1118,  0.1242,  0.2185]], device='cuda:0',
       dtype=torch.float16)
Result:
 tensor([[-0.1279, -0.0049,  1.1494,  2.1348],
        [ 1.8838, -0.1116,  0.1242,  0.2185]], device='cuda:0',
       dtype=torch.float16)
allclose:  True

=== 2D bias case ===
Expected (A@B + bias_2d then GELU):
 tensor([[-1.3390e-02, -4.0340e-04,  7.7295e-01,  2.3301e+00],
        [-1.0999e-01,  1.5503e-02,  1.5808e-01,  4.6240e-01]], device='cuda:0',
       dtype=torch.float16)
Result (torch._addmm_activation):
 tensor([[-1.3390e-02, -4.0102e-04,  7.7246e-01,  2.3281e+00],
        [-1.0999e-01,  1.5587e-02,  1.5808e-01,  4.6289e-01]], device='cuda:0',
       dtype=torch.float16)
allclose:  False

Versions

Latest

cc @albanD @mruberry @jbschlosser @walterddr @mikaylagawarecki @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia

extent analysis

Fix Plan

The issue arises from the difference in implementation between F.gelu and torch._addmm_activation when using a 2D bias.

To fix this, we need to ensure that the bias is added correctly before applying the GELU activation.

Here are the steps:

  • Use torch._addmm_activation with the correct bias addition.
  • Alternatively, use F.gelu with the correct bias addition for consistency.

Code Changes

import torch
import torch.nn.functional as F

torch.manual_seed(0)

mat1 = torch.randn(2, 3, device='cuda', dtype=torch.float16)
mat2 = torch.randn(3, 4, device='cuda', dtype=torch.float16)

bias_1d = torch.randn(4, device='cuda', dtype=torch.float16)
bias_2d = torch.randn(2, 4, device='cuda', dtype=torch.float16)

expected_1d = F.gelu(mat1 @ mat2 + bias_1d, approximate='tanh')
expected_2d = F.gelu(mat1 @ mat2 + bias_2d, approximate='tanh')

# Fix: use F.gelu for consistency
result_1d = F.gelu(mat1 @ mat2 + bias_1d, approximate='tanh')
result_2d = F.gelu(mat1 @ mat2 + bias_2d, approximate='tanh')

# Alternatively, use torch._addmm_activation with correct bias addition
# result_1d = torch._addmm_activation(bias_1d, mat1, mat2, beta=1.0, alpha=1.0, use_gelu=True)
# result_2d = torch._addmm_activation(bias_2d, mat1, mat2, beta=1.0, alpha=1.0, use_gelu=True)

print("=== 1D bias case ===")
print("Expected:\n", expected_1d)
print("Result:\n", result_1d)
print("allclose: ", torch.allclose(expected_1d, result_1d, atol=1e-3))
print()

print("=== 2D bias case ===")
print("Expected (A@B + bias_2d then GELU):\n", expected_2d)
print("Result (F.gelu):\n", result_2d)
print("allclose: ", torch.allclose(expected_2d, result_2d, atol=1e-3))

Verification

Run the modified code and verify that torch.allclose returns True for both the 1D and 2D bias cases.

Extra Tips

  • Ensure that the input tensors are on the same device and have the same data type to avoid any potential issues.
  • Use `torch.all

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING