pytorch - ✅(Solved) Fix AArch64 static runtimes failures [2 pull requests, 1 participants]

pytorch2026-03-26 17:49:37

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#178522•Fetched 2026-04-08 01:35:47

View on GitHub

Comments

Participants

Timeline

Reactions

Author

robert-hardwick

Participants

robert-hardwick

Timeline (top)

mentioned ×8subscribed ×8labeled ×2cross-referenced ×1

Error Message

Traceback of TorchScript (most recent call last): %zero_point: int = prim::Constantvalue=1 %bias: None = prim::Constant() %packed_params = quantized::linear_prepack(%weights, %bias) ~~~~~~~~~ <--- HERE %1254 = quantized::linear(%input, %packed_params, %scale, %zero_point) %1249: Tensor = aten::dequantize(%1254) RuntimeError: unknown architecure

PR fix notes

PR #178270: Enable full AArch64 unit testing for pull requests, maintain periodic m7g coverage

Repository: pytorch/pytorch
Author: robert-hardwick
State: closed | merged: False
Link: https://github.com/pytorch/pytorch/pull/178270

Description (problem / solution / changelog)

This PR removes the bespoke test_linux_aarch64 tests from .ci/pytorch/test.sh which means we can now run the full test suite.

Changes proposed:

Add ( m8g ) AArch64 full unit test jobs to pull.yml with 5 shards. Add ( m7g ) AArch64 full unit test jobs to trunk.yml with 5 shards. ( we want to retain coverage of m7g to catch any sve256 issues ) Remove linux-aarch64.yml workflow file.

Mark some tests in test_static_runtime as skipped and link to issue https://github.com/pytorch/pytorch/issues/178522 to track them.

Stack from ghstack (oldest at bottom):

-> #178270

@fadara01 @nikhil-arm @aditew01 @milpuz01 @malfet @Skylion007

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @aditew01

Changed files

.ci/pytorch/test.sh (modified, +0/-33)
.github/labeler.yml (modified, +0/-10)
.github/pytorch-probot.yml (modified, +0/-1)
.github/workflows/linux-aarch64.yml (removed, +0/-64)
.github/workflows/pull.yml (modified, +46/-1)
.github/workflows/trunk.yml (modified, +30/-0)
.github/workflows/update-viablestrict.yml (modified, +1/-1)
.github/workflows/upload-test-stats.yml (modified, +0/-1)
RELEASE.md (modified, +1/-1)
aten/src/ATen/test/native_test.cpp (modified, +3/-0)
benchmarks/static_runtime/test_static_runtime.cc (modified, +8/-0)
test/cpp/jit/test_misc.cpp (modified, +5/-0)
test/test_fx.py (modified, +4/-0)

PR #180546: Restrict x86 quantization engine to x86 builds

Repository: pytorch/pytorch
Author: murste01
State: open | merged: False
Link: https://github.com/pytorch/pytorch/pull/180546

Description (problem / solution / changelog)

This change disables the x86 quantization engine for non-x86 builds, which has been the default since the 2.11.0 release. This fixes pre-2.11.0 quantization workflows which relied on the prior default - oneDNN.

Changed files

aten/src/ATen/Context.cpp (modified, +4/-2)

Code Example

Traceback of TorchScript (most recent call last):
        %zero_point: int = prim::Constant[value=1]()
        %bias: None = prim::Constant()
        %packed_params = quantized::linear_prepack(%weights, %bias)
                         ~~~~~~~~~ <--- HERE
        %1254 = quantized::linear(%input, %packed_params, %scale, %zero_point)
        %1249: Tensor = aten::dequantize(%1254)
RuntimeError: unknown architecure

---

Note: Google Test filter = ConstantPropagation.CustomClassesCanBePropagated-*_CUDA:*_MultiCUDA
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from ConstantPropagation
[ RUN      ] ConstantPropagation.CustomClassesCanBePropagated
unknown file: Failure
C++ exception with description "Expected to not find "quantized::linear_prepack" but found it
  %0 : NoneType = prim::Constant()
  %11 : QInt8(3, 3, strides=[3, 1], requires_grad=0, device=cpu) = prim::Constant[value= 1  1  1  1  1  1  1  1  1 [ QuantizedCPUQInt8Type{3,3}, qscheme: per_tensor_affine, scale: 1, zero_point: 0 ]]()
  %8 : __torch__.torch.classes.quantized.LinearPackedParamsBase = quantized::linear_prepack(%11, %0)
                                                                  ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  return (%8)
From CHECK-NOT: quantized::linear_prepack

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

We see 2 different test failures in test_static_runtime on AArch64 when trying to enable full test suite on AArch64 see - https://github.com/pytorch/pytorch/pull/178270

Raising this issue so that it can be fixed in a separate PR. These tests will be skipped in 178270.

https://github.com/pytorch/pytorch/actions/runs/23488797123/job/68352842741

Traceback of TorchScript (most recent call last):
        %zero_point: int = prim::Constant[value=1]()
        %bias: None = prim::Constant()
        %packed_params = quantized::linear_prepack(%weights, %bias)
                         ~~~~~~~~~ <--- HERE
        %1254 = quantized::linear(%input, %packed_params, %scale, %zero_point)
        %1249: Tensor = aten::dequantize(%1254)
RuntimeError: unknown architecure

Note: Google Test filter = ConstantPropagation.CustomClassesCanBePropagated-*_CUDA:*_MultiCUDA
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from ConstantPropagation
[ RUN      ] ConstantPropagation.CustomClassesCanBePropagated
unknown file: Failure
C++ exception with description "Expected to not find "quantized::linear_prepack" but found it
  %0 : NoneType = prim::Constant()
  %11 : QInt8(3, 3, strides=[3, 1], requires_grad=0, device=cpu) = prim::Constant[value= 1  1  1  1  1  1  1  1  1 [ QuantizedCPUQInt8Type{3,3}, qscheme: per_tensor_affine, scale: 1, zero_point: 0 ]]()
  %8 : __torch__.torch.classes.quantized.LinearPackedParamsBase = quantized::linear_prepack(%11, %0)
                                                                  ~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  return (%8)
From CHECK-NOT: quantized::linear_prepack

It looks to me like we shouldn't be following FBGEMM path on AArch64 for quantized linear, but instead going via oneDNN, but the qengine ends up being X86 even for AArch64 .

https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Context.cpp#L694-L702

not sure if this is correct

Versions

Runner Type: linux.arm64.m8g.4xlarge Instance Type: m8g.4xlarge

PyTorch SHA: 99dee0579426fc4816d45ec18b3c0a376604aa8a

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @Xia-Weiwen @leslie-fang-intel

extent analysis

Fix Plan

To fix the issue, we need to ensure that the qengine is set to oneDNN for AArch64 architecture instead of X86.

Here are the steps:

Check the ATen/Context.cpp file and modify the getQEngine function to return QEngine::ONE_DNN for AArch64 architecture.
Update the quantized::linear_prepack function to use the correct qengine.

Example code:

// ATen/Context.cpp
QEngine getQEngine(const Device& device, const DeviceType device_type) {
  // ...
  if (device_type == DeviceType::CPU && device.arch() == Arch::ARM) {
    return QEngine::ONE_DNN;
  }
  // ...
}

// quantized/linear_prepack.cpp
__torch__.torch.classes.quantized.LinearPackedParamsBase linear_prepack(
    const Tensor& weights, const c10::optional<Tensor>& bias) {
  // ...
  QEngine qengine = getQEngine(device, device_type);
  if (qengine == QEngine::ONE_DNN) {
    // Use oneDNN implementation
  } else {
    // Use FBGEMM implementation
  }
  // ...
}

Verification

To verify that the fix worked, run the test suite again and check that the test_static_runtime test passes on AArch64 architecture.

Extra Tips

Make sure to update the getQEngine function to handle different architectures correctly.
Test the fix thoroughly to ensure that it does not introduce any regressions.
Consider adding a test case to verify that the correct qengine is used for AArch64 architecture.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#retriever error #indexing error #inference speed #output truncation #response parsing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - ✅(Solved) Fix AArch64 static runtimes failures [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

PR fix notes

PR #178270: Enable full AArch64 unit testing for pull requests, maintain periodic m7g coverage

Description (problem / solution / changelog)

Changed files

PR #180546: Restrict x86 quantization engine to x86 builds

Description (problem / solution / changelog)

Changed files

Code Example

🐛 Describe the bug

Versions

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

pytorch - ✅(Solved) Fix AArch64 static runtimes failures [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

PR fix notes

PR #178270: Enable full AArch64 unit testing for pull requests, maintain periodic m7g coverage

Description (problem / solution / changelog)

Changed files

PR #180546: Restrict x86 quantization engine to x86 builds

Description (problem / solution / changelog)

Changed files

Code Example

🐛 Describe the bug

Versions

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING