vllm - ✅(Solved) Fix [Bug][ARM CPU] Build/Runtime error: no matching function for call to ‘at::vec::CPU_CAPABILITY::VecMask<long int, 4>::VecMask(int&)’ when serving Qwen3-VL-8B-Instruct [1 pull requests, 3 comments, 3 participants]

vllm2026-03-17 17:03:31

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#37325•Fetched 2026-04-08 00:53:30

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

referenced ×6commented ×3cross-referenced ×2labeled ×2

Error Message

error

Fix Action

Fix / Workaround

============================== CPU Info

Architecture: aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 96 On-line CPU(s) list: 0-95 Vendor ID: ARM Model name: Neoverse-V2 Model: 1 Thread(s) per core: 1 Core(s) per socket: 96 Socket(s): 1 Stepping: r0p1 BogoMIPS: 2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti L1d cache: 6 MiB (96 instances) L1i cache: 6 MiB (96 instances) L2 cache: 192 MiB (96 instances) L3 cache: 36 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-95 Vulnerability Gather data sampling: Not affected Vulnerability Ghostwrite: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; __user pointer sanitization Vulnerability Spectre v2: Mitigation; CSV2, BHB Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected

============================== CPU Info

(EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] EngineCore failed to start. (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] Traceback (most recent call last): (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/vllm/vllm/v1/engine/core.py", line 1073, in run_engine_core (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] return func(*args, **kwargs) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/vllm/vllm/v1/engine/core.py", line 839, in init (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] super().init( (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/vllm/vllm/v1/engine/core.py", line 122, in init (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] kv_cache_config = self._initialize_kv_caches(vllm_config) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] return func(*args, **kwargs) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/vllm/vllm/v1/engine/core.py", line 278, in _initialize_kv_caches (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] self.model_executor.initialize_from_config(kv_cache_configs) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/vllm/vllm/v1/executor/abstract.py", line 118, in initialize_from_config (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] compilation_times: list[float] = self.collective_rpc("compile_or_warm_up_model") (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/vllm/vllm/v1/executor/uniproc_executor.py", line 78, in collective_rpc (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] result = run_method(self.driver_worker, method, args, kwargs) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/vllm/vllm/v1/serial_utils.py", line 459, in run_method (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] return func(*args, kwargs) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/vllm/vllm/v1/worker/cpu_worker.py", line 142, in compile_or_warm_up_model (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] self.model_runner.warming_up_model() (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/vllm/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] return func(args, kwargs) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/vllm/vllm/v1/worker/cpu_model_runner.py", line 76, in warming_up_model (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] self._dummy_run( (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] return func(args, kwargs) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/vllm/vllm/v1/worker/gpu_model_runner.py", line 5235, in _dummy_run (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] outputs = self.model( (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] return self._call_impl(args, kwargs) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] return forward_call(args, kwargs) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/vllm/vllm/model_executor/models/qwen3_vl.py", line 2288, in forward (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] hidden_states = self.language_model.model( (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/vllm/vllm/compilation/decorators.py", line 583, in call (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] self.aot_compiled_fn = self.aot_compile(args, kwargs) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/vllm/vllm/compilation/wrapper.py", line 168, in aot_compile (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] return self._compiled_callable.aot_compile((args, kwargs)) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/dynamo/eval_frame.py", line 832, in aot_compile (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] return aot_compile_fullgraph( (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/dynamo/aot_compile.py", line 239, in aot_compile_fullgraph (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] compiled_fn = backend( (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/init.py", line 2435, in call (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] return compile_fx(model, inputs, config_patches=self.config) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2486, in compile_fx (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] return compile_fx( (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2537, in compile_fx (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] return _maybe_wrap_and_compile_fx_main( (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2614, in _maybe_wrap_and_compile_fx_main (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] return _compile_fx_main( (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 2823, in _compile_fx_main (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1 (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 968, in _compile_fx_inner (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] raise InductorError(e, currentframe()).with_traceback( (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 956, in _compile_fx_inner (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] mb_compiled_graph = fx_codegen_and_compile( (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1766, in fx_codegen_and_compile (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] compiled_module = graph.compile_to_module() (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2416, in compile_to_module (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] return self._compile_to_module() (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2426, in _compile_to_module (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] mod = self._compile_to_module_lines(wrapper_code) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2501, in _compile_to_module_lines (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] mod = PyCodeCache.load_by_key_path( (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 3674, in load_by_key_path (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] mod = _reload_python_module(key, path, set_sys_modules=in_toplevel) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/runtime/compile_tasks.py", line 35, in _reload_python_module (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] exec(code, mod.dict, mod.dict) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/tmp/torchinductor_micyan01/dq/cdqhilaznl2olbbmtowf73alnpalratwzasvhlxiponelaauaxak.py", line 97, in <module> (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] cpp_fused__to_copy_add_clone_copy_index_mean_mul_pow_rsqrt_select_slice_split_split_with_sizes_sub_unsqueeze_view_1 = async_compile.cpp_pybinding(['const int64_t', 'const at::BFloat16', 'const at::BFloat16', 'const at::BFloat16', 'const at::BFloat16', 'at::BFloat16', 'at::BFloat16', 'at::BFloat16', 'at::BFloat16', 'at::BFloat16', 'at::BFloat16', 'float', 'at::BFloat16', 'at::BFloat16', 'float', 'at::BFloat16', 'at::BFloat16', 'const int64_t', 'const int64_t'], r''' (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/async_compile.py", line 517, in cpp_pybinding (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] return CppPythonBindingsCodeCache.load_pybinding(argtypes, source_code) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 3155, in load_pybinding (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] return cls.load_pybinding_async(*args, **kwargs)() (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 3147, in future (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] result = get_result() (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 2937, in load_fn (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] result = worker_fn() (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] ^^^^^^^^^^^ (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 2966, in _worker_compile_cpp (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] builder.build() (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 2144, in build (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] run_compile_cmd(build_cmd, cwd=_build_tmp_dir) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 636, in run_compile_cmd (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] _run_compile_cmd(cmd_line, cwd) (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] File "/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 631, in _run_compile_cmd (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] raise exc.CppCompileError(cmd, output) from e (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] torch._inductor.exc.InductorError: CppCompileError: C++ compile error (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] Command: (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099] g++ /tmp/torchinductor_micyan01/hf/chff4cnehpomoxidw7f7odez3g6s67wraiq24c3ib4jshcmej6qi.main.cpp -D TORCH_INDUCTOR_CPP_WRAPPER -D STANDALONE_TORCH_HEADER -D C10_USING_CUSTOM_GENERATED_MACROS -D CPU_CAPABILITY_NEON -D AT_BUILD_ARM_VEC256_WITH_SLEEF -O3 -DNDEBUG -fno-trapping-math -funsafe-math-optimizations -ffinite-math-only -fno-signed-zeros -fno-math-errno -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -fexcess-precision=fast -fno-tree-loop-vectorize -march=native -shared -fPIC -Wall -std=c++17 -Wno-unused-variable -Wno-unknown-pragmas -pedantic -fopenmp -include /tmp/torchinductor_micyan01/precompiled_headers/chxm3ardqi4tkakdjyfdyzmoyevsw5u4yzl2bjtykaekm7ldosyn.h -I/home/micyan01/miniforge3/envs/mlperf/include/python3.12 -I/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/include -I/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -o /tmp/torchinductor_micyan01/hf/chff4cnehpomoxidw7f7odez3g6s67wraiq24c3ib4jshcmej6qi.main.so -ltorch -ltorch_cpu -ltorch_python -lgomp -L/home/micyan01/miniforge3/envs/mlperf/lib -L/home/micyan01/miniforge3/envs/mlperf/lib/python3.12/site-packages/torch/lib (EngineCore pid=25054) ERROR 03-17 16:59:57 [core.py:1099]

PR fix notes

PR #178148: [CPU][Inductor] Use VecMask::from for scalar masks in codegen

Repository: pytorch/pytorch
Author: fadara01
State: closed | merged: False
Link: https://github.com/pytorch/pytorch/pull/178148

Description (problem / solution / changelog)

Stack from ghstack (oldest at bottom):

-> #178148

Fixes: #178136, https://github.com/vllm-project/vllm/issues/37325

Signed-off-by: Fadi Arafeh [email protected]

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @aditew01 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @mlazos

Changed files

test/inductor/test_cpu_repro.py (modified, +20/-0)
torch/_inductor/codegen/cpp.py (modified, +1/-1)

Code Example

Collecting environment information...
==============================
        System Info
==============================
OS                           : Ubuntu 24.04.3 LTS (aarch64)
GCC version                  : (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Clang version                : 18.1.3 (1ubuntu1)
CMake version                : version 4.2.3
Libc version                 : glibc-2.39

==============================
       PyTorch Info
==============================
PyTorch version              : 2.10.0+cpu
Is debug build               : False
CUDA used to build PyTorch   : None
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.13 | packaged by conda-forge | (main, Mar  5 2026, 16:39:32) [GCC 14.3.0] (64-bit runtime)
Python platform              : Linux-6.14.0-1018-aws-aarch64-with-glibc2.39

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : False
CUDA runtime version         : No CUDA
CUDA_MODULE_LOADING set to   : N/A
GPU models and configuration : No CUDA
Nvidia driver version        : No CUDA
cuDNN version                : No CUDA
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  96
On-line CPU(s) list:                     0-95
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per socket:                      96
Socket(s):                               1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               6 MiB (96 instances)
L1i cache:                               6 MiB (96 instances)
L2 cache:                                192 MiB (96 instances)
L3 cache:                                36 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-95
Vulnerability Gather data sampling:      Not affected
Vulnerability Ghostwrite:                Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

==============================
Versions of relevant libraries
==============================
[pip3] mypy==1.19.1
[pip3] mypy_extensions==1.1.0
[pip3] numpy==2.2.6
[pip3] pyzmq==27.1.0
[pip3] torch==2.10.0+cpu
[pip3] torchaudio==2.10.0+cpu
[pip3] torchvision==0.25.0+cpu
[pip3] transformers==4.57.6
[conda] numpy                                       2.2.6                           pypi_0                pypi
[conda] pyzmq                                       27.1.0                          pypi_0                pypi
[conda] torch                                       2.10.0+cpu                      pypi_0                pypi
[conda] torchaudio                                  2.10.0+cpu                      pypi_0                pypi
[conda] torchvision                                 0.25.0+cpu                      pypi_0                pypi
[conda] transformers                                4.57.6                          pypi_0                pypi

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.17.2rc1.dev26+g4ed51308c (git sha: 4ed51308c)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
  Could not collect

==============================
     Environment Variables
==============================
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_micyan01

---

git clone https://github.com/vllm-project/vllm vllm-cpu && \
cd vllm-cpu && \
git checkout main && \
git log -n1 && \
pip3 install --break-system-packages -r requirements/cpu.txt && \
VLLM_TARGET_DEVICE=cpu pip install --break-system-packages . --no-build-isolation

RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>

Collecting environment information...

    System Info

============================== OS : Ubuntu 24.04.3 LTS (aarch64) GCC version : (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0 Clang version : 18.1.3 (1ubuntu1) CMake version : version 4.2.3 Libc version : glibc-2.39

============================== PyTorch Info

PyTorch version : 2.10.0+cpu Is debug build : False CUDA used to build PyTorch : None ROCM used to build PyTorch : N/A

============================== Python Environment

Python version : 3.12.13 | packaged by conda-forge | (main, Mar 5 2026, 16:39:32) [GCC 14.3.0] (64-bit runtime) Python platform : Linux-6.14.0-1018-aws-aarch64-with-glibc2.39

============================== CUDA / GPU Info

Is CUDA available : False CUDA runtime version : No CUDA CUDA_MODULE_LOADING set to : N/A GPU models and configuration : No CUDA Nvidia driver version : No CUDA cuDNN version : No CUDA HIP runtime version : N/A MIOpen runtime version : N/A Is XNNPACK available : True

============================== CPU Info

============================== Versions of relevant libraries

[pip3] mypy==1.19.1 [pip3] mypy_extensions==1.1.0 [pip3] numpy==2.2.6 [pip3] pyzmq==27.1.0 [pip3] torch==2.10.0+cpu [pip3] torchaudio==2.10.0+cpu [pip3] torchvision==0.25.0+cpu [pip3] transformers==4.57.6 [conda] numpy 2.2.6 pypi_0 pypi [conda] pyzmq 27.1.0 pypi_0 pypi [conda] torch 2.10.0+cpu pypi_0 pypi [conda] torchaudio 2.10.0+cpu pypi_0 pypi [conda] torchvision 0.25.0+cpu pypi_0 pypi [conda] transformers 4.57.6 pypi_0 pypi

============================== vLLM Info

ROCM Version : Could not collect vLLM Version : 0.17.2rc1.dev26+g4ed51308c (git sha: 4ed51308c) vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled GPU Topology: Could not collect

============================== Environment Variables

PYTORCH_NVML_BASED_CUDA_CHECK=1 TORCHINDUCTOR_COMPILE_THREADS=1 </code></summary>

Collecting environment information...
==============================
        System Info
==============================
OS                           : Ubuntu 24.04.3 LTS (aarch64)
GCC version                  : (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Clang version                : 18.1.3 (1ubuntu1)
CMake version                : version 4.2.3
Libc version                 : glibc-2.39

==============================
       PyTorch Info
==============================
PyTorch version              : 2.10.0+cpu
Is debug build               : False
CUDA used to build PyTorch   : None
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.13 | packaged by conda-forge | (main, Mar  5 2026, 16:39:32) [GCC 14.3.0] (64-bit runtime)
Python platform              : Linux-6.14.0-1018-aws-aarch64-with-glibc2.39

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : False
CUDA runtime version         : No CUDA
CUDA_MODULE_LOADING set to   : N/A
GPU models and configuration : No CUDA
Nvidia driver version        : No CUDA
cuDNN version                : No CUDA
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  96
On-line CPU(s) list:                     0-95
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per socket:                      96
Socket(s):                               1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               6 MiB (96 instances)
L1i cache:                               6 MiB (96 instances)
L2 cache:                                192 MiB (96 instances)
L3 cache:                                36 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-95
Vulnerability Gather data sampling:      Not affected
Vulnerability Ghostwrite:                Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

==============================
Versions of relevant libraries
==============================
[pip3] mypy==1.19.1
[pip3] mypy_extensions==1.1.0
[pip3] numpy==2.2.6
[pip3] pyzmq==27.1.0
[pip3] torch==2.10.0+cpu
[pip3] torchaudio==2.10.0+cpu
[pip3] torchvision==0.25.0+cpu
[pip3] transformers==4.57.6
[conda] numpy                                       2.2.6                           pypi_0                pypi
[conda] pyzmq                                       27.1.0                          pypi_0                pypi
[conda] torch                                       2.10.0+cpu                      pypi_0                pypi
[conda] torchaudio                                  2.10.0+cpu                      pypi_0                pypi
[conda] torchvision                                 0.25.0+cpu                      pypi_0                pypi
[conda] transformers                                4.57.6                          pypi_0                pypi

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.17.2rc1.dev26+g4ed51308c (git sha: 4ed51308c)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
  Could not collect

==============================
     Environment Variables
==============================
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_micyan01

</details>

🐛 Describe the bug

install

git clone https://github.com/vllm-project/vllm vllm-cpu && \
cd vllm-cpu && \
git checkout main && \
git log -n1 && \
pip3 install --break-system-packages -r requirements/cpu.txt && \
VLLM_TARGET_DEVICE=cpu pip install --break-system-packages . --no-build-isolation

run

vllm serve Qwen/Qwen3-VL-8B-Instruct

error

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

The error message indicates a C++ compile error, which is likely due to a missing dependency or incompatible compiler flags. To fix this issue, we can try the following steps:

Update the compiler: Ensure that the GCC version is up-to-date, as the error message mentions (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0. You can update the compiler by running sudo apt update && sudo apt install gcc.
Install missing dependencies: The error message mentions several dependencies, including ltorch, ltorch_cpu, ltorch_python, and lgomp. Ensure that these dependencies are installed by running sudo apt install libtorch libtorch_cpu libtorch_python libgomp.
Modify compiler flags: The error message shows several compiler flags, including -march=native and -fopenmp. You can try modifying these flags to see if it resolves the issue. For example, you can try removing the -march=native flag or replacing it with a different architecture flag.

Here's an example of how you can modify the compiler flags:

import os

# Set the compiler flags
os.environ["CXXFLAGS"] = "-O3 -DNDEBUG -fno-trapping-math -funsafe-math-optimizations -ffinite-math-only -fno-signed-zeros -fno-math-errno -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -fexcess-precision=fast -fno-tree-loop-vectorize"

# Set the library flags
os.environ["LDFLAGS"] = "-ltorch -ltorch_cpu -ltorch_python -lgomp"

Clean and rebuild: After modifying the compiler flags or installing missing dependencies, clean and rebuild the project by running git clean -fdx && pip install --break-system-packages -r requirements/cpu.txt && VLLM_TARGET_DEVICE=cpu pip install --break-system-packages . --no-build-isolation.

Verification

To verify that the fix worked, you can try running the vllm serve command again:

vllm serve Qwen/Qwen3-VL-8B-Instruct

If the issue is resolved, the command should run without errors.

Extra Tips

Ensure that the TORCHINDUCTOR_CACHE_DIR environment variable is set to a valid directory.
Try setting the TORCHINDUCTOR_COMPILE_THREADS environment variable to a smaller value, such as 1, to see if it resolves the issue.
If you're still experiencing issues, try resetting the torchinductor cache by running rm -rf /tmp/torchinductor_micyan01.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #optimization #environment variable #authentication setup #request error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

vllm - ✅(Solved) Fix [Bug][ARM CPU] Build/Runtime error: no matching function for call to ‘at::vec::CPU_CAPABILITY::VecMask<long int, 4>::VecMask(int&)’ when serving Qwen3-VL-8B-Instruct [1 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

error

Fix Action

Fix / Workaround

============================== CPU Info

============================== CPU Info

PR fix notes

PR #178148: [CPU][Inductor] Use VecMask::from for scalar masks in codegen

Description (problem / solution / changelog)

Changed files

Code Example

Your current environment

Collecting environment information...

============================== PyTorch Info

============================== Python Environment

============================== CUDA / GPU Info

============================== CPU Info

============================== Versions of relevant libraries

============================== vLLM Info

============================== Environment Variables

🐛 Describe the bug

install

run

error

Before submitting a new issue...

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING