vllm - ✅(Solved) Fix [Build] CPU extension fails to compile on macOS ARM64 (Apple Silicon) [1 pull requests, 1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#41537Fetched 2026-05-04 04:59:00
View on GitHub
Comments
1
Participants
1
Timeline
6
Reactions
0
Participants
Timeline (top)
project_v2_item_status_changed ×2added_to_project_v2 ×1commented ×1cross-referenced ×1

Building vLLM's CPU backend from source on macOS ARM64 (Apple Silicon) fails with two compilation errors:

Error Message

error: no matching constructor for initialization of 'vec_op::FP32Vec16'

Root Cause

Building vLLM's CPU backend from source on macOS ARM64 (Apple Silicon) fails with two compilation errors:

Fix Action

Fixed

PR fix notes

PR #41538: [Build] Fix CPU extension build on macOS ARM64

Description (problem / solution / changelog)

Summary

Fixes #41537

  • Bump C++ standard from 17 to 20 — the CPU attention code already uses C++20 structured bindings captured in lambdas, which GCC accepts in C++17 mode as an extension but Clang rejects
  • Add missing FP32Vec16(const BF16Vec32&, int) constructor to cpu_types_arm.hpp — this constructor exists in cpu_types_x86.hpp but was never ported to the ARM types header, causing a compilation error in load_b_pair_vec() for FP8 attention paths

Test plan

  • Build from source on macOS ARM64: uv pip install --editable . --torch-backend=auto
  • Verify existing Linux x86_64 builds are unaffected (C++20 is backward compatible with C++17)
  • Verify existing Linux aarch64 builds are unaffected

AI-assisted: This PR was developed with AI assistance. All changes have been reviewed and understood by the submitter. No existing open PR addresses this issue.

🤖 Generated with Claude Code

Changed files

  • CMakeLists.txt (modified, +1/-1)
  • cmake/cpu_extension.cmake (modified, +1/-1)
  • csrc/cpu/cpu_types_arm.hpp (modified, +7/-0)

Code Example

auto [fp32_b_0_reg, fp32_b_1_reg] = load_b_pair_vec(curr_b);
vec_op::unroll_loop<int32_t, M>([&](int32_t i) {
    // fp32_b_0_reg and fp32_b_1_reg captured here — requires C++20
    c_regs[i * 2] = c_regs[i * 2] + a_reg * fp32_b_0_reg;
    c_regs[i * 2 + 1] = c_regs[i * 2 + 1] + a_reg * fp32_b_1_reg;
});

---

warning: captured structured bindings are a C++20 extension [-Wc++20-extensions]

---

error: no matching constructor for initialization of 'vec_op::FP32Vec16'

---

# On macOS ARM64 (Apple Silicon) with Python 3.12
uv pip install --editable . --torch-backend=auto
RAW_BUFFERClick to expand / collapse

Description

Building vLLM's CPU backend from source on macOS ARM64 (Apple Silicon) fails with two compilation errors:

1. C++20 structured bindings captured in lambdas

csrc/cpu/cpu_attn_vec.hpp:105 uses C++20 structured bindings captured in lambdas:

auto [fp32_b_0_reg, fp32_b_1_reg] = load_b_pair_vec(curr_b);
vec_op::unroll_loop<int32_t, M>([&](int32_t i) {
    // fp32_b_0_reg and fp32_b_1_reg captured here — requires C++20
    c_regs[i * 2] = c_regs[i * 2] + a_reg * fp32_b_0_reg;
    c_regs[i * 2 + 1] = c_regs[i * 2 + 1] + a_reg * fp32_b_1_reg;
});

GCC allows this as an extension in C++17 mode, but Clang (used on macOS) rejects it with:

warning: captured structured bindings are a C++20 extension [-Wc++20-extensions]

2. Missing FP32Vec16(const BF16Vec32&, int) constructor on ARM

csrc/cpu/cpu_attn_vec.hpp:23 calls FP32Vec16(bf16_b_reg, 0) and FP32Vec16(bf16_b_reg, 1), but this two-argument constructor only exists in cpu_types_x86.hpp, not in cpu_types_arm.hpp:

error: no matching constructor for initialization of 'vec_op::FP32Vec16'

Steps to reproduce

# On macOS ARM64 (Apple Silicon) with Python 3.12
uv pip install --editable . --torch-backend=auto

Environment

  • macOS (Apple Silicon / ARM64)
  • Apple Clang (any recent version)
  • Python 3.12
  • torch 2.11.0

Suggested fix

  1. Bump CMAKE_CXX_STANDARD from 17 to 20 in CMakeLists.txt and cmake/cpu_extension.cmake
  2. Add the missing FP32Vec16(const BF16Vec32&, int) constructor to cpu_types_arm.hpp

extent analysis

TL;DR

Update the C++ standard to 20 and add the missing FP32Vec16 constructor to resolve the compilation errors.

Guidance

  • Verify that updating CMAKE_CXX_STANDARD to 20 in CMakeLists.txt and cmake/cpu_extension.cmake resolves the C++20 structured bindings issue.
  • Add the missing FP32Vec16(const BF16Vec32&, int) constructor to cpu_types_arm.hpp to fix the constructor initialization error.
  • Ensure that the updated code compiles without errors on macOS ARM64 with the specified environment.
  • Review the changes to ensure they do not introduce any regressions or compatibility issues.

Example

// Add the following constructor to cpu_types_arm.hpp
FP32Vec16(const BF16Vec32& vec, int index) {
    // implementation for ARM architecture
}

Notes

The suggested fixes assume that the code is compatible with C++20 and that the missing constructor can be implemented for the ARM architecture.

Recommendation

Apply the suggested fixes to update the C++ standard and add the missing constructor, as they directly address the compilation errors and are likely to resolve the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Build] CPU extension fails to compile on macOS ARM64 (Apple Silicon) [1 pull requests, 1 comments, 1 participants]