pytorch - 💡(How to fix) Fix AArch64 Unit Test Failures - Multiple failures in test/nn/test_convolution.py TestConvolutionNNDeviceTypeCPU for oneDNN [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#177245Fetched 2026-04-08 00:42:55
View on GitHub
Comments
0
Participants
1
Timeline
117
Reactions
0
Participants
Timeline (top)
mentioned ×48subscribed ×48referenced ×15labeled ×5

Error Message

Traceback (most recent call last): File "/builds/software-machine-learning-infra-frameworks-workspaces-robhar02/pytorch/test/nn/test_convolution.py", line 3224, in test_conv_contiguous_for_oneDNN self.assertEqual(y, y_) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4365, in assertEqual raise error_metas.pop()[0].to_error( # type: ignore[index] AssertionError: Tensor-likes are not close!

Mismatched elements: 765360 / 4111104 (18.6%) Greatest absolute difference: 0.001953125 at index (0, 12, 78, 147) (up to 1e-05 allowed) Greatest relative difference: 362.0 at index (0, 13, 62, 151) (up to 0.001 allowed)

To execute this test, run the following from the base repo dir: python test/nn/test_convolution.py TestConvolutionNNDeviceTypeCPU.test_conv_contiguous_for_oneDNN_cpu

Code Example

Traceback (most recent call last):
  File "/builds/software-machine-learning-infra-frameworks-workspaces-robhar02/pytorch/test/nn/test_convolution.py", line 3224, in test_conv_contiguous_for_oneDNN
    self.assertEqual(y, y_)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4365, in assertEqual
    raise error_metas.pop()[0].to_error(  # type: ignore[index]
AssertionError: Tensor-likes are not close!

Mismatched elements: 765360 / 4111104 (18.6%)
Greatest absolute difference: 0.001953125 at index (0, 12, 78, 147) (up to 1e-05 allowed)
Greatest relative difference: 362.0 at index (0, 13, 62, 151) (up to 0.001 allowed)

To execute this test, run the following from the base repo dir:
    python test/nn/test_convolution.py TestConvolutionNNDeviceTypeCPU.test_conv_contiguous_for_oneDNN_cpu
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

Failing Tests ( suspect the same cause )

python test/nn/test_convolution.py TestConvolutionNNDeviceTypeCPU.test_conv_contiguous_for_oneDNN_cpu python test/nn/test_convolution.py TestConvolutionNNDeviceTypeCPU.test_conv_ic1_channels_last_for_oneDNN_cpu

Example Traceback

Traceback (most recent call last):
  File "/builds/software-machine-learning-infra-frameworks-workspaces-robhar02/pytorch/test/nn/test_convolution.py", line 3224, in test_conv_contiguous_for_oneDNN
    self.assertEqual(y, y_)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4365, in assertEqual
    raise error_metas.pop()[0].to_error(  # type: ignore[index]
AssertionError: Tensor-likes are not close!

Mismatched elements: 765360 / 4111104 (18.6%)
Greatest absolute difference: 0.001953125 at index (0, 12, 78, 147) (up to 1e-05 allowed)
Greatest relative difference: 362.0 at index (0, 13, 62, 151) (up to 0.001 allowed)

To execute this test, run the following from the base repo dir:
    python test/nn/test_convolution.py TestConvolutionNNDeviceTypeCPU.test_conv_contiguous_for_oneDNN_cpu

Affects Neoverse-V1 and Neoverse-V2

Versions

Commit - https://github.com/pytorch/pytorch/commit/08b6f48d871affbc7abe9277020aed882fdf110a

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @aditew01 @gujinghui @PenghuiCheng @jianyuh @min-jean-cho @yanbing-j @Guobing-Chen @Xia-Weiwen @snadampal

extent analysis

Fix Plan

The fix involves updating the convolution test to account for numerical precision differences between CPU and oneDNN implementations.

  • Update the test_conv_contiguous_for_oneDNN and test_conv_ic1_channels_last_for_oneDNN test functions to use a larger tolerance when comparing tensors.
  • Modify the assertEqual statement to use torch.testing.assert_close instead, which allows for a specified tolerance.

Example code changes:

import torch

# ...

def test_conv_contiguous_for_oneDNN(self):
    # ...
    torch.testing.assert_close(y, y_, rtol=1e-4, atol=1e-5)

def test_conv_ic1_channels_last_for_oneDNN(self):
    # ...
    torch.testing.assert_close(y, y_, rtol=1e-4, atol=1e-5)

Verification

To verify the fix, run the affected tests again:

python test/nn/test_convolution.py TestConvolutionNNDeviceTypeCPU.test_conv_contiguous_for_oneDNN_cpu
python test/nn/test_convolution.py TestConvolutionNNDeviceTypeCPU.test_conv_ic1_channels_last_for_oneDNN_cpu

If the tests pass, the fix is successful.

Extra Tips

  • When working with numerical computations, it's essential to consider the effects of numerical precision and rounding errors.
  • Using torch.testing.assert_close instead of assertEqual can help catch issues related to numerical precision.
  • Be cautious when updating tolerance values, as they may affect the accuracy of the tests.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING