vllm - ✅(Solved) Fix [Bug]: macOS arm64 build broken — `cpu_attn_vec.hpp` calls `FP32Vec16(const BF16Vec32&, int)` constructor that doesn't exist on ARM (regression from #39445) [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#41437Fetched 2026-05-01 05:33:38
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Participants
Timeline (top)
added_to_project_v2 ×1cross-referenced ×1labeled ×1project_v2_item_status_changed ×1

Error Message

csrc/cpu/cpu_attn_vec.hpp:23:13: error: no matching constructor for initialization of 'vec_op::FP32Vec16' 23 | return {vec_op::FP32Vec16(bf16_b_reg, 0), vec_op::FP32Vec16(bf16_b_reg, 1)};

csrc/cpu/cpu_attn_vec.hpp:27:13: error: no matching constructor for initialization of 'vec_op::FP32Vec16' 27 | return {vec_op::FP32Vec16(bf16_b_reg, 0), vec_op::FP32Vec16(bf16_b_reg, 1)};

Root Cause

csrc/cpu/cpu_attn_vec.hpp (introduced/expanded by #39445, 22524f7a9) calls:

vec_op::FP32Vec16(bf16_b_reg, 0)

where bf16_b_reg is a vec_op::BF16Vec32. This requires a constructor with signature:

explicit FP32Vec16(const BF16Vec32& v, int upper);

That constructor is defined for x86 only — it appears twice in csrc/cpu/cpu_types_x86.hpp (line 519 for AVX-512, line 628 for AVX-2/256), using _mm512_extracti32x8_epi32 / _mm256_extractf128_si256 intrinsics.

csrc/cpu/cpu_types_arm.hpp has FP32Vec16 with constructors taking const float*, bool, const float*, etc., but no (const BF16Vec32&, int) overload. So when cpu_attn_vec.hpp is compiled on ARM, the call doesn't resolve.

The new code in cpu_attn_vec.hpp is not gated on x86 — there's no #ifdef __x86_64__ / ENABLE_X86_ISA guard around the FP8 KV-cache code path that uses this constructor.

Other backends (cpu_types_neon.hpp, cpu_types_vsx.hpp, cpu_types_arm.hpp, cpu_types_s390x.hpp) likely have the same gap, but ARM is the one I can reproduce on.

PR fix notes

PR #41387: [Fix] Add missing stubs from cpu fp8 attention changes

Description (problem / solution / changelog)

Purpose

https://github.com/vllm-project/vllm/pull/39445 is missing some stubs for fp8 attention, which cause compilation errors when using clang (see https://github.com/tianmu-li/vllm/actions/runs/25149522905/job/73716695058#logs). This PR adds them.

Test Plan

Test Result


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
</details>

Changed files

  • csrc/cpu/cpu_types_arm.hpp (modified, +4/-0)
  • csrc/cpu/cpu_types_scalar.hpp (modified, +10/-0)
  • csrc/cpu/cpu_types_vsx.hpp (modified, +10/-0)
  • csrc/cpu/cpu_types_vxe.hpp (modified, +4/-0)

Code Example

csrc/cpu/cpu_attn_vec.hpp:23:13: error: no matching constructor for initialization of 'vec_op::FP32Vec16'
   23 |     return {vec_op::FP32Vec16(bf16_b_reg, 0), vec_op::FP32Vec16(bf16_b_reg, 1)};

csrc/cpu/cpu_attn_vec.hpp:27:13: error: no matching constructor for initialization of 'vec_op::FP32Vec16'
   27 |     return {vec_op::FP32Vec16(bf16_b_reg, 0), vec_op::FP32Vec16(bf16_b_reg, 1)};

---

vec_op::FP32Vec16(bf16_b_reg, 0)

---

explicit FP32Vec16(const BF16Vec32& v, int upper);

---

git clone https://github.com/vllm-project/vllm.git
cd vllm
uv venv --python 3.12
uv pip install -r requirements/cpu.txt --index-strategy unsafe-best-match
uv pip install -r requirements/build/cpu.txt --index-strategy unsafe-best-match
CC=/usr/bin/clang CXX=/usr/bin/clang++ \
  uv pip install -e . --no-build-isolation --index-strategy unsafe-best-match
RAW_BUFFERClick to expand / collapse

Your current environment

  • OS: macOS 26.3.1 (arm64, Apple Silicon)
  • Compiler: Apple clang 21.0.0
  • Python: 3.12.13 (uv-managed venv)
  • torch: 2.11.0
  • vLLM commit: 9c61864bf (current main HEAD as of filing)
  • VLLM_TARGET_DEVICE: auto-routed to cpu per setup.py:42-44

🐛 Describe the bug

Building from source on macOS arm64 (uv pip install -e .) fails with the following compiler errors in csrc/cpu/cpu_attn_vec.hpp:

csrc/cpu/cpu_attn_vec.hpp:23:13: error: no matching constructor for initialization of 'vec_op::FP32Vec16'
   23 |     return {vec_op::FP32Vec16(bf16_b_reg, 0), vec_op::FP32Vec16(bf16_b_reg, 1)};

csrc/cpu/cpu_attn_vec.hpp:27:13: error: no matching constructor for initialization of 'vec_op::FP32Vec16'
   27 |     return {vec_op::FP32Vec16(bf16_b_reg, 0), vec_op::FP32Vec16(bf16_b_reg, 1)};

(There are 4 errors total — one per FP32Vec16(bf16_b_reg, N) call across lines 23 and 27.)

Root cause

csrc/cpu/cpu_attn_vec.hpp (introduced/expanded by #39445, 22524f7a9) calls:

vec_op::FP32Vec16(bf16_b_reg, 0)

where bf16_b_reg is a vec_op::BF16Vec32. This requires a constructor with signature:

explicit FP32Vec16(const BF16Vec32& v, int upper);

That constructor is defined for x86 only — it appears twice in csrc/cpu/cpu_types_x86.hpp (line 519 for AVX-512, line 628 for AVX-2/256), using _mm512_extracti32x8_epi32 / _mm256_extractf128_si256 intrinsics.

csrc/cpu/cpu_types_arm.hpp has FP32Vec16 with constructors taking const float*, bool, const float*, etc., but no (const BF16Vec32&, int) overload. So when cpu_attn_vec.hpp is compiled on ARM, the call doesn't resolve.

The new code in cpu_attn_vec.hpp is not gated on x86 — there's no #ifdef __x86_64__ / ENABLE_X86_ISA guard around the FP8 KV-cache code path that uses this constructor.

Other backends (cpu_types_neon.hpp, cpu_types_vsx.hpp, cpu_types_arm.hpp, cpu_types_s390x.hpp) likely have the same gap, but ARM is the one I can reproduce on.

Reproduction

git clone https://github.com/vllm-project/vllm.git
cd vllm
uv venv --python 3.12
uv pip install -r requirements/cpu.txt --index-strategy unsafe-best-match
uv pip install -r requirements/build/cpu.txt --index-strategy unsafe-best-match
CC=/usr/bin/clang CXX=/usr/bin/clang++ \
  uv pip install -e . --no-build-isolation --index-strategy unsafe-best-match

The build fails during compilation of csrc/cpu/cpu_attn.cpp (which transitively includes cpu_attn_vec.hpp).

Suggested fix

Two reasonable directions:

  1. Add an ARM stub for FP32Vec16(const BF16Vec32&, int) in csrc/cpu/cpu_types_arm.hpp, converting the BF16 register half-by-half (similar to the x86 NEON pattern used for other BF16↔FP32 conversions in that file). Same for any other CPU backends that lack it.

  2. Gate the FP8 KV-cache path on x86 in cpu_attn_vec.hpp with #if defined(__AVX512F__) (or whatever the existing convention is), so the unsupported branch never compiles on ARM. Existing files like cpu_attn_amx.hpp already use compile-time guards for AMX-only code.

Option 1 is more invasive but keeps the FP8 attention path available on ARM. Option 2 is quicker and matches how AMX is handled today.

I came across this while rebasing a branch onto main for an unrelated CMake refactor (#41432). Filing in case it's blocking other macOS contributors.

Before submitting a new issue...

  • Make sure I already searched for relevant issues — no existing issue mentions cpu_attn_vec.hpp or this regression.

extent analysis

TL;DR

The most likely fix is to add an ARM stub for FP32Vec16(const BF16Vec32&, int) in csrc/cpu/cpu_types_arm.hpp or gate the FP8 KV-cache path on x86 in cpu_attn_vec.hpp.

Guidance

  • Identify the missing constructor FP32Vec16(const BF16Vec32&, int) in csrc/cpu/cpu_types_arm.hpp and consider adding an implementation for ARM.
  • Review cpu_attn_vec.hpp and determine if the FP8 KV-cache path can be gated on x86 using compile-time guards like #if defined(__AVX512F__).
  • Verify that the chosen solution resolves the compiler errors in csrc/cpu/cpu_attn_vec.hpp.
  • Consider implementing similar fixes for other CPU backends that lack the FP32Vec16(const BF16Vec32&, int) constructor.

Example

// Example implementation of FP32Vec16(const BF16Vec32&, int) for ARM
// in csrc/cpu/cpu_types_arm.hpp
explicit FP32Vec16(const BF16Vec32& v, int upper) {
    // Convert BF16 register half-by-half to FP32
    // Implementation details omitted for brevity
}

Notes

The provided solutions assume that the missing constructor is the primary cause of the compiler errors. Additional issues may arise during implementation, and thorough testing is recommended to ensure the fix works as expected.

Recommendation

Apply workaround by adding an ARM stub for FP32Vec16(const BF16Vec32&, int) in csrc/cpu/cpu_types_arm.hpp, as this approach keeps the FP8 attention path available on ARM.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Bug]: macOS arm64 build broken — `cpu_attn_vec.hpp` calls `FP32Vec16(const BF16Vec32&, int)` constructor that doesn't exist on ARM (regression from #39445) [2 pull requests, 1 participants]