pytorch - ✅(Solved) Fix AArch64 static runtimes failures [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#178522Fetched 2026-04-08 01:35:47
View on GitHub
Comments
0
Participants
1
Timeline
19
Reactions
0
Participants
Timeline (top)
mentioned ×8subscribed ×8labeled ×2cross-referenced ×1

Error Message

Traceback of TorchScript (most recent call last): %zero_point: int = prim::Constantvalue=1 %bias: None = prim::Constant() %packed_params = quantized::linear_prepack(%weights, %bias) ~~~~~~~~~ <--- HERE %1254 = quantized::linear(%input, %packed_params, %scale, %zero_point) %1249: Tensor = aten::dequantize(%1254) RuntimeError: unknown architecure

PR fix notes

PR #178270: Enable full AArch64 unit testing for pull requests, maintain periodic m7g coverage

Description (problem / solution / changelog)

This PR removes the bespoke test_linux_aarch64 tests from .ci/pytorch/test.sh which means we can now run the full test suite.

Changes proposed:

Add ( m8g ) AArch64 full unit test jobs to pull.yml with 5 shards. Add ( m7g ) AArch64 full unit test jobs to trunk.yml with 5 shards. ( we want to retain coverage of m7g to catch any sve256 issues ) Remove linux-aarch64.yml workflow file.

Mark some tests in test_static_runtime as skipped and link to issue https://github.com/pytorch/pytorch/issues/178522 to track them.

Stack from ghstack (oldest at bottom):

  • -> #178270

@fadara01 @nikhil-arm @aditew01 @milpuz01 @malfet @Skylion007

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @aditew01

Changed files

  • .ci/pytorch/test.sh (modified, +0/-33)
  • .github/labeler.yml (modified, +0/-10)
  • .github/pytorch-probot.yml (modified, +0/-1)
  • .github/workflows/linux-aarch64.yml (removed, +0/-64)
  • .github/workflows/pull.yml (modified, +46/-1)
  • .github/workflows/trunk.yml (modified, +30/-0)
  • .github/workflows/update-viablestrict.yml (modified, +1/-1)
  • .github/workflows/upload-test-stats.yml (modified, +0/-1)
  • RELEASE.md (modified, +1/-1)
  • aten/src/ATen/test/native_test.cpp (modified, +3/-0)
  • benchmarks/static_runtime/test_static_runtime.cc (modified, +8/-0)
  • test/cpp/jit/test_misc.cpp (modified, +5/-0)
  • test/test_fx.py (modified, +4/-0)

PR #180546: Restrict x86 quantization engine to x86 builds

Description (problem / solution / changelog)

This change disables the x86 quantization engine for non-x86 builds, which has been the default since the 2.11.0 release. This fixes pre-2.11.0 quantization workflows which relied on the prior default - oneDNN.

Changed files

  • aten/src/ATen/Context.cpp (modified, +4/-2)

Code Example

Traceback of TorchScript (most recent call last):
        %zero_point: int = prim::Constant[value=1]()
        %bias: None = prim::Constant()
        %packed_params = quantized::linear_prepack(%weights, %bias)
                         ~~~~~~~~~ <--- HERE
        %1254 = quantized::linear(%input, %packed_params, %scale, %zero_point)
        %1249: Tensor = aten::dequantize(%1254)
RuntimeError: unknown architecure

---

Note: Google Test filter = ConstantPropagation.CustomClassesCanBePropagated-*_CUDA:*_MultiCUDA
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from ConstantPropagation
[ RUN      ] ConstantPropagation.CustomClassesCanBePropagated
unknown file: Failure
C++ exception with description "Expected to not find "quantized::linear_prepack" but found it
  %0 : NoneType = prim::Constant()
  %11 : QInt8(3, 3, strides=[3, 1], requires_grad=0, device=cpu) = prim::Constant[value= 1  1  1  1  1  1  1  1  1 [ QuantizedCPUQInt8Type{3,3}, qscheme: per_tensor_affine, scale: 1, zero_point: 0 ]]()
  %8 : __torch__.torch.classes.quantized.LinearPackedParamsBase = quantized::linear_prepack(%11, %0)
                                                                  ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  return (%8)
From CHECK-NOT: quantized::linear_prepack
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

We see 2 different test failures in test_static_runtime on AArch64 when trying to enable full test suite on AArch64 see - https://github.com/pytorch/pytorch/pull/178270

Raising this issue so that it can be fixed in a separate PR. These tests will be skipped in 178270.

https://github.com/pytorch/pytorch/actions/runs/23488797123/job/68352842741

Traceback of TorchScript (most recent call last):
        %zero_point: int = prim::Constant[value=1]()
        %bias: None = prim::Constant()
        %packed_params = quantized::linear_prepack(%weights, %bias)
                         ~~~~~~~~~ <--- HERE
        %1254 = quantized::linear(%input, %packed_params, %scale, %zero_point)
        %1249: Tensor = aten::dequantize(%1254)
RuntimeError: unknown architecure
Note: Google Test filter = ConstantPropagation.CustomClassesCanBePropagated-*_CUDA:*_MultiCUDA
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from ConstantPropagation
[ RUN      ] ConstantPropagation.CustomClassesCanBePropagated
unknown file: Failure
C++ exception with description "Expected to not find "quantized::linear_prepack" but found it
  %0 : NoneType = prim::Constant()
  %11 : QInt8(3, 3, strides=[3, 1], requires_grad=0, device=cpu) = prim::Constant[value= 1  1  1  1  1  1  1  1  1 [ QuantizedCPUQInt8Type{3,3}, qscheme: per_tensor_affine, scale: 1, zero_point: 0 ]]()
  %8 : __torch__.torch.classes.quantized.LinearPackedParamsBase = quantized::linear_prepack(%11, %0)
                                                                  ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  return (%8)
From CHECK-NOT: quantized::linear_prepack

It looks to me like we shouldn't be following FBGEMM path on AArch64 for quantized linear, but instead going via oneDNN, but the qengine ends up being X86 even for AArch64 .

https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Context.cpp#L694-L702

not sure if this is correct

Versions

Runner Type: linux.arm64.m8g.4xlarge Instance Type: m8g.4xlarge

PyTorch SHA: 99dee0579426fc4816d45ec18b3c0a376604aa8a

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @Xia-Weiwen @leslie-fang-intel

extent analysis

Fix Plan

To fix the issue, we need to ensure that the qengine is set to oneDNN for AArch64 architecture instead of X86.

Here are the steps:

  • Check the ATen/Context.cpp file and modify the getQEngine function to return QEngine::ONE_DNN for AArch64 architecture.
  • Update the quantized::linear_prepack function to use the correct qengine.

Example code:

// ATen/Context.cpp
QEngine getQEngine(const Device& device, const DeviceType device_type) {
  // ...
  if (device_type == DeviceType::CPU && device.arch() == Arch::ARM) {
    return QEngine::ONE_DNN;
  }
  // ...
}

// quantized/linear_prepack.cpp
__torch__.torch.classes.quantized.LinearPackedParamsBase linear_prepack(
    const Tensor& weights, const c10::optional<Tensor>& bias) {
  // ...
  QEngine qengine = getQEngine(device, device_type);
  if (qengine == QEngine::ONE_DNN) {
    // Use oneDNN implementation
  } else {
    // Use FBGEMM implementation
  }
  // ...
}

Verification

To verify that the fix worked, run the test suite again and check that the test_static_runtime test passes on AArch64 architecture.

Extra Tips

  • Make sure to update the getQEngine function to handle different architectures correctly.
  • Test the fix thoroughly to ensure that it does not introduce any regressions.
  • Consider adding a test case to verify that the correct qengine is used for AArch64 architecture.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING