pytorch - ✅(Solved) Fix [CPU][Inductor] No known conversion from `int` to `const at::vec::CPU_CAPABILITY::VecMask<long int, 2>&` [1 pull requests, 3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#178136Fetched 2026-04-08 01:16:22
View on GitHub
Comments
3
Participants
3
Timeline
83
Reactions
0
Author
Assignees
Timeline (top)
mentioned ×29subscribed ×29labeled ×11referenced ×6

Error Message

torch._inductor.exc.InductorError: CppCompileError: C++ compile error

Command: g++ /tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp -D TORCH_INDUCTOR_CPP_WRAPPER -D STANDALONE_TORCH_HEADER -D C10_USING_CUSTOM_GENERATED_MACROS -D CPU_CAPABILITY_NEON -D AT_BUILD_ARM_VEC256_WITH_SLEEF -O3 -DNDEBUG -fno-trapping-math -funsafe-math-optimizations -ffinite-math-only -fno-signed-zeros -fno-math-errno -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -fexcess-precision=fast -fno-tree-loop-vectorize -march=native -shared -fPIC -Wall -std=c++17 -Wno-unused-variable -Wno-unknown-pragmas -pedantic -fopenmp -include /tmp/torchinductor_fadara01/precompiled_headers/caylb6j65w3whq4iaxxxdmgtjuoyn3ajwuwdwdttgrgtvdy7v7fj.h -I/usr/include/python3.10 -I/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include -I/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -o /tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.so -ltorch -ltorch_cpu -ltorch_python -lgomp -L/usr/lib/aarch64-linux-gnu -L/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/lib

Output: In file included from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/c10/macros/Macros.h:2, from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/NumericUtils.h:8, from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:21, from /tmp/torchinductor_fadara01/precompiled_headers/caylb6j65w3whq4iaxxxdmgtjuoyn3ajwuwdwdttgrgtvdy7v7fj.h:1: /tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp: In lambda function: /tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp:62:275: error: no matching function for call to ‘at::vec::CPU_CAPABILITY::VecMask<long int, 2>::VecMask(int&)’ 62 | TORCH_CHECK((at::vec::VecMask<int64_t,2>::set(at::vec::VecMask<int64_t,2>::from(1), ((at::vec::VecMask<int64_t,2>((at::vec::VectorizedN<int64_t,2>(0) <= tmp19) & (tmp19 < at::vec::VectorizedN<int64_t,2>(1L)))) | ~(at::vec::VecMask<int64_t,2>(tmp8))), static_cast<int64_t>(2L))).all_masked(), "index out of bounds: 0 <= tmp19 < 1L"); | ^ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/torch/headeronly/macros/Macros.h:202:64: note: in definition of macro ‘C10_UNLIKELY’ 202 | #define C10_UNLIKELY(expr) (builtin_expect(static_cast<bool>(expr), 0)) | ^~~~ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/c10/util/Exception.h:566:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’ 566 | if (C10_UNLIKELY_OR_CONST(!(cond))) {
| ^~~~~~~~~~~~~~~~~~~~~ /tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp:62:29: note: in expansion of macro ‘TORCH_CHECK’ 62 | TORCH_CHECK((at::vec::VecMask<int64_t,2>::set(at::vec::VecMask<int64_t,2>::from(1), ((at::vec::VecMask<int64_t,2>((at::vec::VectorizedN<int64_t,2>(0) <= tmp19) & (tmp19 < at::vec::VectorizedN<int64_t,2>(1L)))) | ~(at::vec::VecMask<int64_t,2>(tmp8))), static_cast<int64_t>(2L))).all_masked(), "index out of bounds: 0 <= tmp19 < 1L"); | ^~~~~~~~~~~ In file included from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_base.h:1532, from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec128/vec128_float_neon.h:8, from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec128/vec128_bfloat16_neon.h:6, from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec128/vec128.h:9, from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:7, from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7, from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4, from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45: /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:133:3: note: candidate: ‘template<int L, typename std::enable_if<(L == 1), int>::type <anonymous> > at::vec::CPU_CAPABILITY::VecMask<T, N>::VecMask(const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with int L = L; typename std::enable_if<(L == 1), int>::type <anonymous> = <anonymous>; T = long int; int N = 2]’ 133 | VecMask(const Vectorized<T>& mask) : mask
(mask) {} | ^~~~~~~ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:133:3: note: template argument deduction/substitution failed: /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:132:65: error: no type named ‘type’ in ‘struct std::enable_if<false, int>’ 132 | template <int L = N, typename std::enable_if_t<L == 1, int> = 0> | ^ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:130:3: note: candidate: ‘at::vec::CPU_CAPABILITY::VecMask<T, N>::VecMask(const at::vec::CPU_CAPABILITY::VectorizedN<T, N>&) [with T = long int; int N = 2]’ 130 | VecMask(const VectorizedN<T, N>& mask) : mask
(mask) {} | ^~~~~~~ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:130:36: note: no known conversion for argument 1 from ‘int’ to ‘const at::vec::CPU_CAPABILITY::VectorizedN<long int, 2>&’ 130 | VecMask(const VectorizedN<T, N>& mask) : mask_(mask) {} | ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:129:3: note: candidate: ‘at::vec::CPU_CAPABILITY::VecMask<T, N>::VecMask() [with T = long int; int N = 2]’ 129 | VecMask() : mask_(static_cast<T>(0)) {} | ^~~~~~~ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:129:3: note: candidate expects 0 arguments, 1 provided /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note: candidate: ‘constexpr at::vec::CPU_CAPABILITY::VecMask<long int, 2>::VecMask(const at::vec::CPU_CAPABILITY::VecMask<long int, 2>&)’ 118 | class VecMask { | ^~~~~~~ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note: no known conversion for argument 1 from ‘int’ to ‘const at::vec::CPU_CAPABILITY::VecMask<long int, 2>&’ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note: candidate: ‘constexpr at::vec::CPU_CAPABILITY::VecMask<long int, 2>::VecMask(at::vec::CPU_CAPABILITY::VecMask<long int, 2>&&)’ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note: no known conversion for argument 1 from ‘int’ to ‘at::vec::CPU_CAPABILITY::VecMask<long int, 2>&&’ /tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp: In lambda function: /tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp:109:276: error: no matching function for call to ‘at::vec::CPU_CAPABILITY::VecMask<long int, 2>::VecMask(int&)’ 109 | TORCH_CHECK((at::vec::VecMask<int64_t,2>::set(at::vec::VecMask<int64_t,2>::from(1), ((at::vec::VecMask<int64_t,2>((at::vec::VectorizedN<int64_t,2>(0) <= tmp39) & (tmp39 < at::vec::VectorizedN<int64_t,2>(1L)))) | ~(at::vec::VecMask<int64_t,2>(tmp28))), static_cast<int64_t>(2L))).all_masked(), "index out of bounds: 0 <= tmp39 < 1L"); | ^ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/torch/headeronly/macros/Macros.h:202:64: note: in definition of macro ‘C10_UNLIKELY’ 202 | #define C10_UNLIKELY(expr) (builtin_expect(static_cast<bool>(expr), 0)) | ^~~~ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/c10/util/Exception.h:566:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’ 566 | if (C10_UNLIKELY_OR_CONST(!(cond))) {
| ^~~~~~~~~~~~~~~~~~~~~ /tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp:109:29: note: in expansion of macro ‘TORCH_CHECK’ 109 | TORCH_CHECK((at::vec::VecMask<int64_t,2>::set(at::vec::VecMask<int64_t,2>::from(1), ((at::vec::VecMask<int64_t,2>((at::vec::VectorizedN<int64_t,2>(0) <= tmp39) & (tmp39 < at::vec::VectorizedN<int64_t,2>(1L)))) | ~(at::vec::VecMask<int64_t,2>(tmp28))), static_cast<int64_t>(2L))).all_masked(), "index out of bounds: 0 <= tmp39 < 1L"); | ^~~~~~~~~~~ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:133:3: note: candidate: ‘template<int L, typename std::enable_if<(L == 1), int>::type <anonymous> > at::vec::CPU_CAPABILITY::VecMask<T, N>::VecMask(const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with int L = L; typename std::enable_if<(L == 1), int>::type <anonymous> = <anonymous>; T = long int; int N = 2]’ 133 | VecMask(const Vectorized<T>& mask) : mask
(mask) {} | ^~~~~~~ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:133:3: note: template argument deduction/substitution failed: /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:132:65: error: no type named ‘type’ in ‘struct std::enable_if<false, int>’ 132 | template <int L = N, typename std::enable_if_t<L == 1, int> = 0> | ^ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:130:3: note: candidate: ‘at::vec::CPU_CAPABILITY::VecMask<T, N>::VecMask(const at::vec::CPU_CAPABILITY::VectorizedN<T, N>&) [with T = long int; int N = 2]’ 130 | VecMask(const VectorizedN<T, N>& mask) : mask
(mask) {} | ^~~~~~~ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:130:36: note: no known conversion for argument 1 from ‘int’ to ‘const at::vec::CPU_CAPABILITY::VectorizedN<long int, 2>&’ 130 | VecMask(const VectorizedN<T, N>& mask) : mask_(mask) {} | ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:129:3: note: candidate: ‘at::vec::CPU_CAPABILITY::VecMask<T, N>::VecMask() [with T = long int; int N = 2]’ 129 | VecMask() : mask_(static_cast<T>(0)) {} | ^~~~~~~ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:129:3: note: candidate expects 0 arguments, 1 provided /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note: candidate: ‘constexpr at::vec::CPU_CAPABILITY::VecMask<long int, 2>::VecMask(const at::vec::CPU_CAPABILITY::VecMask<long int, 2>&)’ 118 | class VecMask { | ^~~~~~~ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note: no known conversion for argument 1 from ‘int’ to ‘const at::vec::CPU_CAPABILITY::VecMask<long int, 2>&’ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note: candidate: ‘constexpr at::vec::CPU_CAPABILITY::VecMask<long int, 2>::VecMask(at::vec::CPU_CAPABILITY::VecMask<long int, 2>&&)’ /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note: no known conversion for argument 1 from ‘int’ to ‘at::vec::CPU_CAPABILITY::VecMask<long int, 2>&&’ In file included from /usr/include/c++/12/bits/stl_algobase.h:64, from /usr/include/c++/12/algorithm:60, from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:5: /usr/include/c++/12/bits/stl_pair.h: In instantiation of ‘constexpr std::pair<typename std::__strip_reference_wrapper<typename std::decay<_Tp>::type>::__type, typename std::__strip_reference_wrapper<typename std::decay<_Tp2>::type>::__type> std::make_pair(_T1&&, _T2&&) [with _T1 = at::vec::CPU_CAPABILITY::Vectorized<float>; _T2 = at::vec::CPU_CAPABILITY::Vectorized<float>; typename __strip_reference_wrapper<typename decay<_Tp2>::type>::__type = at::vec::CPU_CAPABILITY::Vectorized<float>; typename decay<_Tp2>::type = decay<at::vec::CPU_CAPABILITY::Vectorized<float> >::type; typename __strip_reference_wrapper<typename decay<_Tp>::type>::__type = at::vec::CPU_CAPABILITY::Vectorized<float>; typename decay<_Tp>::type = decay<at::vec::CPU_CAPABILITY::Vectorized<float> >::type]’: /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec256/vec256_qint.h:1387:24: required from here /usr/include/c++/12/bits/stl_pair.h:741:5: note: parameter passing for argument of type ‘std::pair<at::vec::CPU_CAPABILITY::Vectorized<float>, at::vec::CPU_CAPABILITY::Vectorized<float> >’ when C++17 is enabled changed to match C++14 in GCC 10.1 741 | make_pair(_T1&& __x, _T2&& __y) | ^~~~~~~~~

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

Fix Action

Fix / Workaround

CPU: Architecture: aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 96 On-line CPU(s) list: 0-95 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per socket: 96 Socket(s): 1 Stepping: r0p1 BogoMIPS: 2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 6 MiB (96 instances) L1i cache: 6 MiB (96 instances) L2 cache: 192 MiB (96 instances) L3 cache: 36 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-95 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; __user pointer sanitization Vulnerability Spectre v2: Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected

PR fix notes

PR #178148: [CPU][Inductor] Use VecMask::from for scalar masks in codegen

Description (problem / solution / changelog)

Stack from ghstack (oldest at bottom):

  • -> #178148

Fixes: #178136, https://github.com/vllm-project/vllm/issues/37325

Signed-off-by: Fadi Arafeh [email protected]

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @aditew01 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @mlazos

Changed files

  • test/inductor/test_cpu_repro.py (modified, +20/-0)
  • torch/_inductor/codegen/cpp.py (modified, +1/-1)

Code Example

import os
import tempfile
import torch

os.environ["TORCHINDUCTOR_CACHE_DIR"] = tempfile.mkdtemp(
    prefix="torchinductor_vecmask_repro"
)

def fn(positions: torch.Tensor, cache: torch.Tensor) -> torch.Tensor:
    x = cache[positions]
    y = x[0].clone()
    y[..., 1::3] = x[1, ..., 1::3]
    y[..., 2::3] = x[2, ..., 2::3]
    return y


def main() -> None:
    positions = torch.tensor([[0, 0], [0, 0], [0, 0]], dtype=torch.int64)
    cache = torch.arange(3, dtype=torch.float32).reshape(1, 3)
    # eager passes
    eager = fn(positions, cache)
    print("eager output:", eager)
    # compile fails
    compiled = torch.compile(fn, backend="inductor", fullgraph=True)
    output = compiled(positions, cache)
    print("compiled output:", output)


if __name__ == "__main__":
    main()

---

torch._inductor.exc.InductorError: CppCompileError: C++ compile error

Command:
g++ /tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp -D TORCH_INDUCTOR_CPP_WRAPPER -D STANDALONE_TORCH_HEADER -D C10_USING_CUSTOM_GENERATED_MACROS -D CPU_CAPABILITY_NEON -D AT_BUILD_ARM_VEC256_WITH_SLEEF -O3 -DNDEBUG -fno-trapping-math -funsafe-math-optimizations -ffinite-math-only -fno-signed-zeros -fno-math-errno -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -fexcess-precision=fast -fno-tree-loop-vectorize -march=native -shared -fPIC -Wall -std=c++17 -Wno-unused-variable -Wno-unknown-pragmas -pedantic -fopenmp -include /tmp/torchinductor_fadara01/precompiled_headers/caylb6j65w3whq4iaxxxdmgtjuoyn3ajwuwdwdttgrgtvdy7v7fj.h -I/usr/include/python3.10 -I/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include -I/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -o /tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.so -ltorch -ltorch_cpu -ltorch_python -lgomp -L/usr/lib/aarch64-linux-gnu -L/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/lib

Output:
In file included from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/c10/macros/Macros.h:2,
                 from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/NumericUtils.h:8,
                 from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:21,
                 from /tmp/torchinductor_fadara01/precompiled_headers/caylb6j65w3whq4iaxxxdmgtjuoyn3ajwuwdwdttgrgtvdy7v7fj.h:1:
/tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp: In lambda function:
/tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp:62:275: error: no matching function for call to ‘at::vec::CPU_CAPABILITY::VecMask<long int, 2>::VecMask(int&)   62 |                             TORCH_CHECK((at::vec::VecMask<int64_t,2>::set(at::vec::VecMask<int64_t,2>::from(1), ((at::vec::VecMask<int64_t,2>((at::vec::VectorizedN<int64_t,2>(0) <= tmp19) & (tmp19 < at::vec::VectorizedN<int64_t,2>(1L)))) | ~(at::vec::VecMask<int64_t,2>(tmp8))), static_cast<int64_t>(2L))).all_masked(), "index out of bounds: 0 <= tmp19 < 1L");
      |                                                                                                                                                                                                                                                                                   ^
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/torch/headeronly/macros/Macros.h:202:64: note: in definition of macro ‘C10_UNLIKELY  202 | #define C10_UNLIKELY(expr) (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                ^~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/c10/util/Exception.h:566:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST  566 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {       \
      |       ^~~~~~~~~~~~~~~~~~~~~
/tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp:62:29: note: in expansion of macro ‘TORCH_CHECK   62 |                             TORCH_CHECK((at::vec::VecMask<int64_t,2>::set(at::vec::VecMask<int64_t,2>::from(1), ((at::vec::VecMask<int64_t,2>((at::vec::VectorizedN<int64_t,2>(0) <= tmp19) & (tmp19 < at::vec::VectorizedN<int64_t,2>(1L)))) | ~(at::vec::VecMask<int64_t,2>(tmp8))), static_cast<int64_t>(2L))).all_masked(), "index out of bounds: 0 <= tmp19 < 1L");
      |                             ^~~~~~~~~~~
In file included from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_base.h:1532,
                 from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec128/vec128_float_neon.h:8,
                 from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec128/vec128_bfloat16_neon.h:6,
                 from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec128/vec128.h:9,
                 from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:7,
                 from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
                 from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
                 from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45:
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:133:3: note: candidate: ‘template<int L, typename std::enable_if<(L == 1), int>::type <anonymous> > at::vec::CPU_CAPABILITY::VecMask<T, N>::VecMask(const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with int L = L; typename std::enable_if<(L == 1), int>::type <anonymous> = <anonymous>; T = long int; int N = 2]  133 |   VecMask(const Vectorized<T>& mask) : mask_(mask) {}
      |   ^~~~~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:133:3: note:   template argument deduction/substitution failed:
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:132:65: error: no type named ‘type’ in ‘struct std::enable_if<false, int>  132 |   template <int L = N, typename std::enable_if_t<L == 1, int> = 0>
      |                                                                 ^
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:130:3: note: candidate: ‘at::vec::CPU_CAPABILITY::VecMask<T, N>::VecMask(const at::vec::CPU_CAPABILITY::VectorizedN<T, N>&) [with T = long int; int N = 2]  130 |   VecMask(const VectorizedN<T, N>& mask) : mask_(mask) {}
      |   ^~~~~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:130:36: note:   no known conversion for argument 1 from ‘int’ to ‘const at::vec::CPU_CAPABILITY::VectorizedN<long int, 2>&  130 |   VecMask(const VectorizedN<T, N>& mask) : mask_(mask) {}
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:129:3: note: candidate: ‘at::vec::CPU_CAPABILITY::VecMask<T, N>::VecMask() [with T = long int; int N = 2]  129 |   VecMask() : mask_(static_cast<T>(0)) {}
      |   ^~~~~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:129:3: note:   candidate expects 0 arguments, 1 provided
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note: candidate: ‘constexpr at::vec::CPU_CAPABILITY::VecMask<long int, 2>::VecMask(const at::vec::CPU_CAPABILITY::VecMask<long int, 2>&)  118 | class VecMask {
      |       ^~~~~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note:   no known conversion for argument 1 from ‘int’ to ‘const at::vec::CPU_CAPABILITY::VecMask<long int, 2>&/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note: candidate: ‘constexpr at::vec::CPU_CAPABILITY::VecMask<long int, 2>::VecMask(at::vec::CPU_CAPABILITY::VecMask<long int, 2>&&)/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note:   no known conversion for argument 1 from ‘int’ to ‘at::vec::CPU_CAPABILITY::VecMask<long int, 2>&&/tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp: In lambda function:
/tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp:109:276: error: no matching function for call to ‘at::vec::CPU_CAPABILITY::VecMask<long int, 2>::VecMask(int&)  109 |                             TORCH_CHECK((at::vec::VecMask<int64_t,2>::set(at::vec::VecMask<int64_t,2>::from(1), ((at::vec::VecMask<int64_t,2>((at::vec::VectorizedN<int64_t,2>(0) <= tmp39) & (tmp39 < at::vec::VectorizedN<int64_t,2>(1L)))) | ~(at::vec::VecMask<int64_t,2>(tmp28))), static_cast<int64_t>(2L))).all_masked(), "index out of bounds: 0 <= tmp39 < 1L");
      |                                                                                                                                                                                                                                                                                    ^
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/torch/headeronly/macros/Macros.h:202:64: note: in definition of macro ‘C10_UNLIKELY  202 | #define C10_UNLIKELY(expr) (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                ^~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/c10/util/Exception.h:566:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST  566 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {       \
      |       ^~~~~~~~~~~~~~~~~~~~~
/tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp:109:29: note: in expansion of macro ‘TORCH_CHECK  109 |                             TORCH_CHECK((at::vec::VecMask<int64_t,2>::set(at::vec::VecMask<int64_t,2>::from(1), ((at::vec::VecMask<int64_t,2>((at::vec::VectorizedN<int64_t,2>(0) <= tmp39) & (tmp39 < at::vec::VectorizedN<int64_t,2>(1L)))) | ~(at::vec::VecMask<int64_t,2>(tmp28))), static_cast<int64_t>(2L))).all_masked(), "index out of bounds: 0 <= tmp39 < 1L");
      |                             ^~~~~~~~~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:133:3: note: candidate: ‘template<int L, typename std::enable_if<(L == 1), int>::type <anonymous> > at::vec::CPU_CAPABILITY::VecMask<T, N>::VecMask(const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with int L = L; typename std::enable_if<(L == 1), int>::type <anonymous> = <anonymous>; T = long int; int N = 2]  133 |   VecMask(const Vectorized<T>& mask) : mask_(mask) {}
      |   ^~~~~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:133:3: note:   template argument deduction/substitution failed:
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:132:65: error: no type named ‘type’ in ‘struct std::enable_if<false, int>  132 |   template <int L = N, typename std::enable_if_t<L == 1, int> = 0>
      |                                                                 ^
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:130:3: note: candidate: ‘at::vec::CPU_CAPABILITY::VecMask<T, N>::VecMask(const at::vec::CPU_CAPABILITY::VectorizedN<T, N>&) [with T = long int; int N = 2]  130 |   VecMask(const VectorizedN<T, N>& mask) : mask_(mask) {}
      |   ^~~~~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:130:36: note:   no known conversion for argument 1 from ‘int’ to ‘const at::vec::CPU_CAPABILITY::VectorizedN<long int, 2>&  130 |   VecMask(const VectorizedN<T, N>& mask) : mask_(mask) {}
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:129:3: note: candidate: ‘at::vec::CPU_CAPABILITY::VecMask<T, N>::VecMask() [with T = long int; int N = 2]  129 |   VecMask() : mask_(static_cast<T>(0)) {}
      |   ^~~~~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:129:3: note:   candidate expects 0 arguments, 1 provided
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note: candidate: ‘constexpr at::vec::CPU_CAPABILITY::VecMask<long int, 2>::VecMask(const at::vec::CPU_CAPABILITY::VecMask<long int, 2>&)  118 | class VecMask {
      |       ^~~~~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note:   no known conversion for argument 1 from ‘int’ to ‘const at::vec::CPU_CAPABILITY::VecMask<long int, 2>&/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note: candidate: ‘constexpr at::vec::CPU_CAPABILITY::VecMask<long int, 2>::VecMask(at::vec::CPU_CAPABILITY::VecMask<long int, 2>&&)/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note:   no known conversion for argument 1 from ‘int’ to ‘at::vec::CPU_CAPABILITY::VecMask<long int, 2>&&In file included from /usr/include/c++/12/bits/stl_algobase.h:64,
                 from /usr/include/c++/12/algorithm:60,
                 from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:5:
/usr/include/c++/12/bits/stl_pair.h: In instantiation of ‘constexpr std::pair<typename std::__strip_reference_wrapper<typename std::decay<_Tp>::type>::__type, typename std::__strip_reference_wrapper<typename std::decay<_Tp2>::type>::__type> std::make_pair(_T1&&, _T2&&) [with _T1 = at::vec::CPU_CAPABILITY::Vectorized<float>; _T2 = at::vec::CPU_CAPABILITY::Vectorized<float>; typename __strip_reference_wrapper<typename decay<_Tp2>::type>::__type = at::vec::CPU_CAPABILITY::Vectorized<float>; typename decay<_Tp2>::type = decay<at::vec::CPU_CAPABILITY::Vectorized<float> >::type; typename __strip_reference_wrapper<typename decay<_Tp>::type>::__type = at::vec::CPU_CAPABILITY::Vectorized<float>; typename decay<_Tp>::type = decay<at::vec::CPU_CAPABILITY::Vectorized<float> >::type]:
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec256/vec256_qint.h:1387:24:   required from here
/usr/include/c++/12/bits/stl_pair.h:741:5: note: parameter passing for argument of type ‘std::pair<at::vec::CPU_CAPABILITY::Vectorized<float>, at::vec::CPU_CAPABILITY::Vectorized<float> >’ when C++17 is enabled changed to match C++14 in GCC 10.1
  741 |     make_pair(_T1&& __x, _T2&& __y)
      |     ^~~~~~~~~


Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

---

Collecting environment information...
PyTorch version: 2.10.0+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 12.3.0-1ubuntu1~22.04.3) 12.3.0
Clang version: 16.0.6 (++20231112100510+7cbf1a259152-1~exp1~20231112100554.106)
CMake version: version 4.2.3
Libc version: glibc-2.35

Python version: 3.10.12 (main, Mar  3 2026, 11:56:32) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.8.0-1050-aws-aarch64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  96
On-line CPU(s) list:                     0-95
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per socket:                      96
Socket(s):                               1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               6 MiB (96 instances)
L1i cache:                               6 MiB (96 instances)
L2 cache:                                192 MiB (96 instances)
L3 cache:                                36 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-95
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Versions of relevant libraries:
[pip3] numpy==2.2.6
[pip3] torch==2.10.0
[pip3] torchaudio==2.10.0
[pip3] torchvision==0.25.0
[conda] Could not collect
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

This is the root case of: https://github.com/vllm-project/vllm/issues/37325

Note:

  • this is a regression with torch 2.10, I cannot reproduce the problem with torch 2.9.1
  • This fails for both AArch64 and x86

Repro:

import os
import tempfile
import torch

os.environ["TORCHINDUCTOR_CACHE_DIR"] = tempfile.mkdtemp(
    prefix="torchinductor_vecmask_repro"
)

def fn(positions: torch.Tensor, cache: torch.Tensor) -> torch.Tensor:
    x = cache[positions]
    y = x[0].clone()
    y[..., 1::3] = x[1, ..., 1::3]
    y[..., 2::3] = x[2, ..., 2::3]
    return y


def main() -> None:
    positions = torch.tensor([[0, 0], [0, 0], [0, 0]], dtype=torch.int64)
    cache = torch.arange(3, dtype=torch.float32).reshape(1, 3)
    # eager passes
    eager = fn(positions, cache)
    print("eager output:", eager)
    # compile fails
    compiled = torch.compile(fn, backend="inductor", fullgraph=True)
    output = compiled(positions, cache)
    print("compiled output:", output)


if __name__ == "__main__":
    main()

Error:

torch._inductor.exc.InductorError: CppCompileError: C++ compile error

Command:
g++ /tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp -D TORCH_INDUCTOR_CPP_WRAPPER -D STANDALONE_TORCH_HEADER -D C10_USING_CUSTOM_GENERATED_MACROS -D CPU_CAPABILITY_NEON -D AT_BUILD_ARM_VEC256_WITH_SLEEF -O3 -DNDEBUG -fno-trapping-math -funsafe-math-optimizations -ffinite-math-only -fno-signed-zeros -fno-math-errno -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -fexcess-precision=fast -fno-tree-loop-vectorize -march=native -shared -fPIC -Wall -std=c++17 -Wno-unused-variable -Wno-unknown-pragmas -pedantic -fopenmp -include /tmp/torchinductor_fadara01/precompiled_headers/caylb6j65w3whq4iaxxxdmgtjuoyn3ajwuwdwdttgrgtvdy7v7fj.h -I/usr/include/python3.10 -I/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include -I/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -o /tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.so -ltorch -ltorch_cpu -ltorch_python -lgomp -L/usr/lib/aarch64-linux-gnu -L/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/lib

Output:
In file included from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/c10/macros/Macros.h:2,
                 from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/NumericUtils.h:8,
                 from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:21,
                 from /tmp/torchinductor_fadara01/precompiled_headers/caylb6j65w3whq4iaxxxdmgtjuoyn3ajwuwdwdttgrgtvdy7v7fj.h:1:
/tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp: In lambda function:
/tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp:62:275: error: no matching function for call to ‘at::vec::CPU_CAPABILITY::VecMask<long int, 2>::VecMask(int&)’
   62 |                             TORCH_CHECK((at::vec::VecMask<int64_t,2>::set(at::vec::VecMask<int64_t,2>::from(1), ((at::vec::VecMask<int64_t,2>((at::vec::VectorizedN<int64_t,2>(0) <= tmp19) & (tmp19 < at::vec::VectorizedN<int64_t,2>(1L)))) | ~(at::vec::VecMask<int64_t,2>(tmp8))), static_cast<int64_t>(2L))).all_masked(), "index out of bounds: 0 <= tmp19 < 1L");
      |                                                                                                                                                                                                                                                                                   ^
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/torch/headeronly/macros/Macros.h:202:64: note: in definition of macro ‘C10_UNLIKELY’
  202 | #define C10_UNLIKELY(expr) (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                ^~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/c10/util/Exception.h:566:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  566 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {       \
      |       ^~~~~~~~~~~~~~~~~~~~~
/tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp:62:29: note: in expansion of macro ‘TORCH_CHECK’
   62 |                             TORCH_CHECK((at::vec::VecMask<int64_t,2>::set(at::vec::VecMask<int64_t,2>::from(1), ((at::vec::VecMask<int64_t,2>((at::vec::VectorizedN<int64_t,2>(0) <= tmp19) & (tmp19 < at::vec::VectorizedN<int64_t,2>(1L)))) | ~(at::vec::VecMask<int64_t,2>(tmp8))), static_cast<int64_t>(2L))).all_masked(), "index out of bounds: 0 <= tmp19 < 1L");
      |                             ^~~~~~~~~~~
In file included from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_base.h:1532,
                 from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec128/vec128_float_neon.h:8,
                 from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec128/vec128_bfloat16_neon.h:6,
                 from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec128/vec128.h:9,
                 from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:7,
                 from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional_base.h:7,
                 from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/functional.h:4,
                 from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:45:
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:133:3: note: candidate: ‘template<int L, typename std::enable_if<(L == 1), int>::type <anonymous> > at::vec::CPU_CAPABILITY::VecMask<T, N>::VecMask(const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with int L = L; typename std::enable_if<(L == 1), int>::type <anonymous> = <anonymous>; T = long int; int N = 2]’
  133 |   VecMask(const Vectorized<T>& mask) : mask_(mask) {}
      |   ^~~~~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:133:3: note:   template argument deduction/substitution failed:
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:132:65: error: no type named ‘type’ in ‘struct std::enable_if<false, int>’
  132 |   template <int L = N, typename std::enable_if_t<L == 1, int> = 0>
      |                                                                 ^
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:130:3: note: candidate: ‘at::vec::CPU_CAPABILITY::VecMask<T, N>::VecMask(const at::vec::CPU_CAPABILITY::VectorizedN<T, N>&) [with T = long int; int N = 2]’
  130 |   VecMask(const VectorizedN<T, N>& mask) : mask_(mask) {}
      |   ^~~~~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:130:36: note:   no known conversion for argument 1 from ‘int’ to ‘const at::vec::CPU_CAPABILITY::VectorizedN<long int, 2>&’
  130 |   VecMask(const VectorizedN<T, N>& mask) : mask_(mask) {}
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:129:3: note: candidate: ‘at::vec::CPU_CAPABILITY::VecMask<T, N>::VecMask() [with T = long int; int N = 2]’
  129 |   VecMask() : mask_(static_cast<T>(0)) {}
      |   ^~~~~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:129:3: note:   candidate expects 0 arguments, 1 provided
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note: candidate: ‘constexpr at::vec::CPU_CAPABILITY::VecMask<long int, 2>::VecMask(const at::vec::CPU_CAPABILITY::VecMask<long int, 2>&)’
  118 | class VecMask {
      |       ^~~~~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note:   no known conversion for argument 1 from ‘int’ to ‘const at::vec::CPU_CAPABILITY::VecMask<long int, 2>&’
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note: candidate: ‘constexpr at::vec::CPU_CAPABILITY::VecMask<long int, 2>::VecMask(at::vec::CPU_CAPABILITY::VecMask<long int, 2>&&)’
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note:   no known conversion for argument 1 from ‘int’ to ‘at::vec::CPU_CAPABILITY::VecMask<long int, 2>&&’
/tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp: In lambda function:
/tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp:109:276: error: no matching function for call to ‘at::vec::CPU_CAPABILITY::VecMask<long int, 2>::VecMask(int&)’
  109 |                             TORCH_CHECK((at::vec::VecMask<int64_t,2>::set(at::vec::VecMask<int64_t,2>::from(1), ((at::vec::VecMask<int64_t,2>((at::vec::VectorizedN<int64_t,2>(0) <= tmp39) & (tmp39 < at::vec::VectorizedN<int64_t,2>(1L)))) | ~(at::vec::VecMask<int64_t,2>(tmp28))), static_cast<int64_t>(2L))).all_masked(), "index out of bounds: 0 <= tmp39 < 1L");
      |                                                                                                                                                                                                                                                                                    ^
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/torch/headeronly/macros/Macros.h:202:64: note: in definition of macro ‘C10_UNLIKELY’
  202 | #define C10_UNLIKELY(expr) (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                ^~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/c10/util/Exception.h:566:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  566 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {       \
      |       ^~~~~~~~~~~~~~~~~~~~~
/tmp/torchinductor_scalar_vecmask_repro_z3gf6trv/hw/chwjsaojwg6bqgrjclxahrl2qywzi77f5dvlbtnc6tfqoiykajn6.main.cpp:109:29: note: in expansion of macro ‘TORCH_CHECK’
  109 |                             TORCH_CHECK((at::vec::VecMask<int64_t,2>::set(at::vec::VecMask<int64_t,2>::from(1), ((at::vec::VecMask<int64_t,2>((at::vec::VectorizedN<int64_t,2>(0) <= tmp39) & (tmp39 < at::vec::VectorizedN<int64_t,2>(1L)))) | ~(at::vec::VecMask<int64_t,2>(tmp28))), static_cast<int64_t>(2L))).all_masked(), "index out of bounds: 0 <= tmp39 < 1L");
      |                             ^~~~~~~~~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:133:3: note: candidate: ‘template<int L, typename std::enable_if<(L == 1), int>::type <anonymous> > at::vec::CPU_CAPABILITY::VecMask<T, N>::VecMask(const at::vec::CPU_CAPABILITY::Vectorized<T>&) [with int L = L; typename std::enable_if<(L == 1), int>::type <anonymous> = <anonymous>; T = long int; int N = 2]’
  133 |   VecMask(const Vectorized<T>& mask) : mask_(mask) {}
      |   ^~~~~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:133:3: note:   template argument deduction/substitution failed:
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:132:65: error: no type named ‘type’ in ‘struct std::enable_if<false, int>’
  132 |   template <int L = N, typename std::enable_if_t<L == 1, int> = 0>
      |                                                                 ^
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:130:3: note: candidate: ‘at::vec::CPU_CAPABILITY::VecMask<T, N>::VecMask(const at::vec::CPU_CAPABILITY::VectorizedN<T, N>&) [with T = long int; int N = 2]’
  130 |   VecMask(const VectorizedN<T, N>& mask) : mask_(mask) {}
      |   ^~~~~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:130:36: note:   no known conversion for argument 1 from ‘int’ to ‘const at::vec::CPU_CAPABILITY::VectorizedN<long int, 2>&’
  130 |   VecMask(const VectorizedN<T, N>& mask) : mask_(mask) {}
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:129:3: note: candidate: ‘at::vec::CPU_CAPABILITY::VecMask<T, N>::VecMask() [with T = long int; int N = 2]’
  129 |   VecMask() : mask_(static_cast<T>(0)) {}
      |   ^~~~~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:129:3: note:   candidate expects 0 arguments, 1 provided
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note: candidate: ‘constexpr at::vec::CPU_CAPABILITY::VecMask<long int, 2>::VecMask(const at::vec::CPU_CAPABILITY::VecMask<long int, 2>&)’
  118 | class VecMask {
      |       ^~~~~~~
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note:   no known conversion for argument 1 from ‘int’ to ‘const at::vec::CPU_CAPABILITY::VecMask<long int, 2>&’
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note: candidate: ‘constexpr at::vec::CPU_CAPABILITY::VecMask<long int, 2>::VecMask(at::vec::CPU_CAPABILITY::VecMask<long int, 2>&&)’
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:118:7: note:   no known conversion for argument 1 from ‘int’ to ‘at::vec::CPU_CAPABILITY::VecMask<long int, 2>&&’
In file included from /usr/include/c++/12/bits/stl_algobase.h:64,
                 from /usr/include/c++/12/algorithm:60,
                 from /home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/torch/csrc/inductor/cpp_prefix.h:5:
/usr/include/c++/12/bits/stl_pair.h: In instantiation of ‘constexpr std::pair<typename std::__strip_reference_wrapper<typename std::decay<_Tp>::type>::__type, typename std::__strip_reference_wrapper<typename std::decay<_Tp2>::type>::__type> std::make_pair(_T1&&, _T2&&) [with _T1 = at::vec::CPU_CAPABILITY::Vectorized<float>; _T2 = at::vec::CPU_CAPABILITY::Vectorized<float>; typename __strip_reference_wrapper<typename decay<_Tp2>::type>::__type = at::vec::CPU_CAPABILITY::Vectorized<float>; typename decay<_Tp2>::type = decay<at::vec::CPU_CAPABILITY::Vectorized<float> >::type; typename __strip_reference_wrapper<typename decay<_Tp>::type>::__type = at::vec::CPU_CAPABILITY::Vectorized<float>; typename decay<_Tp>::type = decay<at::vec::CPU_CAPABILITY::Vectorized<float> >::type]’:
/home/fadara01/vllm-gelu-lut/venv/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec256/vec256_qint.h:1387:24:   required from here
/usr/include/c++/12/bits/stl_pair.h:741:5: note: parameter passing for argument of type ‘std::pair<at::vec::CPU_CAPABILITY::Vectorized<float>, at::vec::CPU_CAPABILITY::Vectorized<float> >’ when C++17 is enabled changed to match C++14 in GCC 10.1
  741 |     make_pair(_T1&& __x, _T2&& __y)
      |     ^~~~~~~~~


Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

Versions

Collecting environment information...
PyTorch version: 2.10.0+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 12.3.0-1ubuntu1~22.04.3) 12.3.0
Clang version: 16.0.6 (++20231112100510+7cbf1a259152-1~exp1~20231112100554.106)
CMake version: version 4.2.3
Libc version: glibc-2.35

Python version: 3.10.12 (main, Mar  3 2026, 11:56:32) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.8.0-1050-aws-aarch64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  96
On-line CPU(s) list:                     0-95
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per socket:                      96
Socket(s):                               1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               6 MiB (96 instances)
L1i cache:                               6 MiB (96 instances)
L2 cache:                                192 MiB (96 instances)
L3 cache:                                36 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-95
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Versions of relevant libraries:
[pip3] numpy==2.2.6
[pip3] torch==2.10.0
[pip3] torchaudio==2.10.0
[pip3] torchvision==0.25.0
[conda] Could not collect

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @aditew01 @snadampal @milpuz01 @nikhil-arm @nWEIdia @chauhang @penguinwu @voznesenskym @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo @zou3519

extent analysis

Fix Plan

The issue arises from the torch.compile function with the inductor backend. To fix this, we need to modify the code to avoid using the inductor backend or update the PyTorch version to a newer one that supports the inductor backend with the Neon vectorizer.

Here are the steps to fix the issue:

  • Update PyTorch to the latest version.
  • If updating PyTorch is not possible, use the eager mode instead of inductor backend.

Code Changes

To use the eager mode, modify the compiled function call as follows:

# Replace this line
compiled = torch.compile(fn, backend="inductor", fullgraph=True)

# With this line
compiled = fn

Alternatively, if you still want to use the inductor backend, you can try updating PyTorch to the latest version using pip:

pip install --upgrade torch

Verification

To verify that the fix worked, run the main function again:

if __name__ == "__main__":
    main()

If the fix was successful, the main function should run without any errors.

Extra Tips

  • Make sure to update all dependent libraries, such as torchvision and torchaudio, to the latest version.
  • If you are using a virtual environment, ensure that the updated PyTorch version is installed in the correct environment.
  • If you are still experiencing issues, try setting the TORCHDYNAMO_VERBOSE environment variable to 1 to get more detailed error messages.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING