pytorch - 💡(How to fix) Fix Numerical divergence for multidimensional biases [1 comments, 1 participants]

pytorch2026-03-28 18:54:26

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#178689•Fetched 2026-04-08 01:45:06

View on GitHub

Comments

Participants

Timeline

Reactions

Author

benediktjohannes

Participants

benediktjohannes

Timeline (top)

mentioned ×22subscribed ×22labeled ×4closed ×1

Code Example

import torch
import torch.nn.functional as F

torch.manual_seed(0)

mat1 = torch.randn(2, 3, device='cuda', dtype=torch.float16)
mat2 = torch.randn(3, 4, device='cuda', dtype=torch.float16)

bias_1d = torch.randn(4, device='cuda', dtype=torch.float16)
bias_2d = torch.randn(2, 4, device='cuda', dtype=torch.float16)

expected_1d = F.gelu(mat1 @ mat2 + bias_1d, approximate='tanh')
expected_2d = F.gelu(mat1 @ mat2 + bias_2d, approximate='tanh')

result_1d = torch._addmm_activation(bias_1d, mat1, mat2, beta=1.0, alpha=1.0, use_gelu=True)
result_2d = torch._addmm_activation(bias_2d, mat1, mat2, beta=1.0, alpha=1.0, use_gelu=True)

print("=== 1D bias case ===")
print("Expected:\n", expected_1d)
print("Result:\n", result_1d)
print("allclose: ", torch.allclose(expected_1d, result_1d, atol=1e-3))
print()

print("=== 2D bias case ===")
print("Expected (A@B + bias_2d then GELU):\n", expected_2d)
print("Result (torch._addmm_activation):\n", result_2d)
print("allclose: ", torch.allclose(expected_2d, result_2d, atol=1e-3))

---

=== 1D bias case ===
Expected:
 tensor([[-0.1279, -0.0049,  1.1494,  2.1348],
        [ 1.8838, -0.1118,  0.1242,  0.2185]], device='cuda:0',
       dtype=torch.float16)
Result:
 tensor([[-0.1279, -0.0049,  1.1494,  2.1348],
        [ 1.8838, -0.1116,  0.1242,  0.2185]], device='cuda:0',
       dtype=torch.float16)
allclose:  True

=== 2D bias case ===
Expected (A@B + bias_2d then GELU):
 tensor([[-1.3390e-02, -4.0340e-04,  7.7295e-01,  2.3301e+00],
        [-1.0999e-01,  1.5503e-02,  1.5808e-01,  4.6240e-01]], device='cuda:0',
       dtype=torch.float16)
Result (torch._addmm_activation):
 tensor([[-1.3390e-02, -4.0102e-04,  7.7246e-01,  2.3281e+00],
        [-1.0999e-01,  1.5587e-02,  1.5808e-01,  4.6289e-01]], device='cuda:0',
       dtype=torch.float16)
allclose:  False

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

The following script (tested with colab.research.google.com)

import torch
import torch.nn.functional as F

torch.manual_seed(0)

mat1 = torch.randn(2, 3, device='cuda', dtype=torch.float16)
mat2 = torch.randn(3, 4, device='cuda', dtype=torch.float16)

bias_1d = torch.randn(4, device='cuda', dtype=torch.float16)
bias_2d = torch.randn(2, 4, device='cuda', dtype=torch.float16)

expected_1d = F.gelu(mat1 @ mat2 + bias_1d, approximate='tanh')
expected_2d = F.gelu(mat1 @ mat2 + bias_2d, approximate='tanh')

result_1d = torch._addmm_activation(bias_1d, mat1, mat2, beta=1.0, alpha=1.0, use_gelu=True)
result_2d = torch._addmm_activation(bias_2d, mat1, mat2, beta=1.0, alpha=1.0, use_gelu=True)

print("=== 1D bias case ===")
print("Expected:\n", expected_1d)
print("Result:\n", result_1d)
print("allclose: ", torch.allclose(expected_1d, result_1d, atol=1e-3))
print()

print("=== 2D bias case ===")
print("Expected (A@B + bias_2d then GELU):\n", expected_2d)
print("Result (torch._addmm_activation):\n", result_2d)
print("allclose: ", torch.allclose(expected_2d, result_2d, atol=1e-3))

shows as a result

=== 1D bias case ===
Expected:
 tensor([[-0.1279, -0.0049,  1.1494,  2.1348],
        [ 1.8838, -0.1118,  0.1242,  0.2185]], device='cuda:0',
       dtype=torch.float16)
Result:
 tensor([[-0.1279, -0.0049,  1.1494,  2.1348],
        [ 1.8838, -0.1116,  0.1242,  0.2185]], device='cuda:0',
       dtype=torch.float16)
allclose:  True

=== 2D bias case ===
Expected (A@B + bias_2d then GELU):
 tensor([[-1.3390e-02, -4.0340e-04,  7.7295e-01,  2.3301e+00],
        [-1.0999e-01,  1.5503e-02,  1.5808e-01,  4.6240e-01]], device='cuda:0',
       dtype=torch.float16)
Result (torch._addmm_activation):
 tensor([[-1.3390e-02, -4.0102e-04,  7.7246e-01,  2.3281e+00],
        [-1.0999e-01,  1.5587e-02,  1.5808e-01,  4.6289e-01]], device='cuda:0',
       dtype=torch.float16)
allclose:  False

Versions

Latest

cc @albanD @mruberry @jbschlosser @walterddr @mikaylagawarecki @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia

extent analysis

Fix Plan

The issue arises from the difference in implementation between F.gelu and torch._addmm_activation when using a 2D bias.

To fix this, we need to ensure that the bias is added correctly before applying the GELU activation.

Here are the steps:

Use torch._addmm_activation with the correct bias addition.
Alternatively, use F.gelu with the correct bias addition for consistency.

Code Changes

import torch
import torch.nn.functional as F

torch.manual_seed(0)

mat1 = torch.randn(2, 3, device='cuda', dtype=torch.float16)
mat2 = torch.randn(3, 4, device='cuda', dtype=torch.float16)

bias_1d = torch.randn(4, device='cuda', dtype=torch.float16)
bias_2d = torch.randn(2, 4, device='cuda', dtype=torch.float16)

expected_1d = F.gelu(mat1 @ mat2 + bias_1d, approximate='tanh')
expected_2d = F.gelu(mat1 @ mat2 + bias_2d, approximate='tanh')

# Fix: use F.gelu for consistency
result_1d = F.gelu(mat1 @ mat2 + bias_1d, approximate='tanh')
result_2d = F.gelu(mat1 @ mat2 + bias_2d, approximate='tanh')

# Alternatively, use torch._addmm_activation with correct bias addition
# result_1d = torch._addmm_activation(bias_1d, mat1, mat2, beta=1.0, alpha=1.0, use_gelu=True)
# result_2d = torch._addmm_activation(bias_2d, mat1, mat2, beta=1.0, alpha=1.0, use_gelu=True)

print("=== 1D bias case ===")
print("Expected:\n", expected_1d)
print("Result:\n", result_1d)
print("allclose: ", torch.allclose(expected_1d, result_1d, atol=1e-3))
print()

print("=== 2D bias case ===")
print("Expected (A@B + bias_2d then GELU):\n", expected_2d)
print("Result (F.gelu):\n", result_2d)
print("allclose: ", torch.allclose(expected_2d, result_2d, atol=1e-3))

Verification

Run the modified code and verify that torch.allclose returns True for both the 1D and 2D bias cases.

Extra Tips

Ensure that the input tensors are on the same device and have the same data type to avoid any potential issues.
Use `torch.all

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#device allocation #model download #tokenizer error #prompt formatting #chain error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix Numerical divergence for multidimensional biases [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

🐛 Describe the bug

Versions

extent analysis

Fix Plan

Code Changes

Verification

Extra Tips

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix Numerical divergence for multidimensional biases [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

🐛 Describe the bug

Versions

extent analysis

Fix Plan

Code Changes

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING