vllm - ✅(Solved) Fix [Bug]: macOS arm64 build broken — `cpu_attn_vec.hpp` calls `FP32Vec16(const BF16Vec32&, int)` constructor that doesn't exist on ARM (regression from #39445) [2 pull requests, 1 participants]

vllm2026-05-01 01:20:07

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#41437•Fetched 2026-05-01 05:33:38

View on GitHub

Comments

Participants

Timeline

Reactions

Author

mcsantiago

Participants

mcsantiago

Timeline (top)

added_to_project_v2 ×1cross-referenced ×1labeled ×1project_v2_item_status_changed ×1

Error Message

csrc/cpu/cpu_attn_vec.hpp:23:13: error: no matching constructor for initialization of 'vec_op::FP32Vec16' 23 | return {vec_op::FP32Vec16(bf16_b_reg, 0), vec_op::FP32Vec16(bf16_b_reg, 1)};

csrc/cpu/cpu_attn_vec.hpp:27:13: error: no matching constructor for initialization of 'vec_op::FP32Vec16' 27 | return {vec_op::FP32Vec16(bf16_b_reg, 0), vec_op::FP32Vec16(bf16_b_reg, 1)};

Root Cause

csrc/cpu/cpu_attn_vec.hpp (introduced/expanded by #39445, 22524f7a9) calls:

vec_op::FP32Vec16(bf16_b_reg, 0)

where bf16_b_reg is a vec_op::BF16Vec32. This requires a constructor with signature:

explicit FP32Vec16(const BF16Vec32& v, int upper);

That constructor is defined for x86 only — it appears twice in csrc/cpu/cpu_types_x86.hpp (line 519 for AVX-512, line 628 for AVX-2/256), using _mm512_extracti32x8_epi32 / _mm256_extractf128_si256 intrinsics.

csrc/cpu/cpu_types_arm.hpp has FP32Vec16 with constructors taking const float*, bool, const float*, etc., but no (const BF16Vec32&, int) overload. So when cpu_attn_vec.hpp is compiled on ARM, the call doesn't resolve.

The new code in cpu_attn_vec.hpp is not gated on x86 — there's no #ifdef __x86_64__ / ENABLE_X86_ISA guard around the FP8 KV-cache code path that uses this constructor.

Other backends (cpu_types_neon.hpp, cpu_types_vsx.hpp, cpu_types_arm.hpp, cpu_types_s390x.hpp) likely have the same gap, but ARM is the one I can reproduce on.

PR fix notes

PR #41387: [Fix] Add missing stubs from cpu fp8 attention changes

Repository: vllm-project/vllm
Author: tianmu-li
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/41387

Description (problem / solution / changelog)

Purpose

https://github.com/vllm-project/vllm/pull/39445 is missing some stubs for fp8 attention, which cause compilation errors when using clang (see https://github.com/tianmu-li/vllm/actions/runs/25149522905/job/73716695058#logs). This PR adds them.

Test Plan

Test Result

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

</details>

Changed files

csrc/cpu/cpu_types_arm.hpp (modified, +4/-0)
csrc/cpu/cpu_types_scalar.hpp (modified, +10/-0)
csrc/cpu/cpu_types_vsx.hpp (modified, +10/-0)
csrc/cpu/cpu_types_vxe.hpp (modified, +4/-0)

Code Example

csrc/cpu/cpu_attn_vec.hpp:23:13: error: no matching constructor for initialization of 'vec_op::FP32Vec16'
   23 |     return {vec_op::FP32Vec16(bf16_b_reg, 0), vec_op::FP32Vec16(bf16_b_reg, 1)};

csrc/cpu/cpu_attn_vec.hpp:27:13: error: no matching constructor for initialization of 'vec_op::FP32Vec16'
   27 |     return {vec_op::FP32Vec16(bf16_b_reg, 0), vec_op::FP32Vec16(bf16_b_reg, 1)};

---

vec_op::FP32Vec16(bf16_b_reg, 0)

---

explicit FP32Vec16(const BF16Vec32& v, int upper);

---

git clone https://github.com/vllm-project/vllm.git
cd vllm
uv venv --python 3.12
uv pip install -r requirements/cpu.txt --index-strategy unsafe-best-match
uv pip install -r requirements/build/cpu.txt --index-strategy unsafe-best-match
CC=/usr/bin/clang CXX=/usr/bin/clang++ \
  uv pip install -e . --no-build-isolation --index-strategy unsafe-best-match

RAW_BUFFERClick to expand / collapse

Your current environment

OS: macOS 26.3.1 (arm64, Apple Silicon)
Compiler: Apple clang 21.0.0
Python: 3.12.13 (uv-managed venv)
torch: 2.11.0
vLLM commit: 9c61864bf (current main HEAD as of filing)
VLLM_TARGET_DEVICE: auto-routed to cpu per setup.py:42-44

🐛 Describe the bug

Building from source on macOS arm64 (uv pip install -e .) fails with the following compiler errors in csrc/cpu/cpu_attn_vec.hpp:

csrc/cpu/cpu_attn_vec.hpp:23:13: error: no matching constructor for initialization of 'vec_op::FP32Vec16'
   23 |     return {vec_op::FP32Vec16(bf16_b_reg, 0), vec_op::FP32Vec16(bf16_b_reg, 1)};

csrc/cpu/cpu_attn_vec.hpp:27:13: error: no matching constructor for initialization of 'vec_op::FP32Vec16'
   27 |     return {vec_op::FP32Vec16(bf16_b_reg, 0), vec_op::FP32Vec16(bf16_b_reg, 1)};

(There are 4 errors total — one per FP32Vec16(bf16_b_reg, N) call across lines 23 and 27.)

Root cause

csrc/cpu/cpu_attn_vec.hpp (introduced/expanded by #39445, 22524f7a9) calls:

vec_op::FP32Vec16(bf16_b_reg, 0)

where bf16_b_reg is a vec_op::BF16Vec32. This requires a constructor with signature:

explicit FP32Vec16(const BF16Vec32& v, int upper);

The new code in cpu_attn_vec.hpp is not gated on x86 — there's no #ifdef __x86_64__ / ENABLE_X86_ISA guard around the FP8 KV-cache code path that uses this constructor.

Other backends (cpu_types_neon.hpp, cpu_types_vsx.hpp, cpu_types_arm.hpp, cpu_types_s390x.hpp) likely have the same gap, but ARM is the one I can reproduce on.

Reproduction

git clone https://github.com/vllm-project/vllm.git
cd vllm
uv venv --python 3.12
uv pip install -r requirements/cpu.txt --index-strategy unsafe-best-match
uv pip install -r requirements/build/cpu.txt --index-strategy unsafe-best-match
CC=/usr/bin/clang CXX=/usr/bin/clang++ \
  uv pip install -e . --no-build-isolation --index-strategy unsafe-best-match

The build fails during compilation of csrc/cpu/cpu_attn.cpp (which transitively includes cpu_attn_vec.hpp).

Suggested fix

Two reasonable directions:

Add an ARM stub for FP32Vec16(const BF16Vec32&, int) in csrc/cpu/cpu_types_arm.hpp, converting the BF16 register half-by-half (similar to the x86 NEON pattern used for other BF16↔FP32 conversions in that file). Same for any other CPU backends that lack it.
Gate the FP8 KV-cache path on x86 in cpu_attn_vec.hpp with #if defined(__AVX512F__) (or whatever the existing convention is), so the unsupported branch never compiles on ARM. Existing files like cpu_attn_amx.hpp already use compile-time guards for AMX-only code.

Option 1 is more invasive but keeps the FP8 attention path available on ARM. Option 2 is quicker and matches how AMX is handled today.

I came across this while rebasing a branch onto main for an unrelated CMake refactor (#41432). Filing in case it's blocking other macOS contributors.

Before submitting a new issue...

Make sure I already searched for relevant issues — no existing issue mentions cpu_attn_vec.hpp or this regression.

extent analysis

TL;DR

The most likely fix is to add an ARM stub for FP32Vec16(const BF16Vec32&, int) in csrc/cpu/cpu_types_arm.hpp or gate the FP8 KV-cache path on x86 in cpu_attn_vec.hpp.

Guidance

Identify the missing constructor FP32Vec16(const BF16Vec32&, int) in csrc/cpu/cpu_types_arm.hpp and consider adding an implementation for ARM.
Review cpu_attn_vec.hpp and determine if the FP8 KV-cache path can be gated on x86 using compile-time guards like #if defined(__AVX512F__).
Verify that the chosen solution resolves the compiler errors in csrc/cpu/cpu_attn_vec.hpp.
Consider implementing similar fixes for other CPU backends that lack the FP32Vec16(const BF16Vec32&, int) constructor.

Example

// Example implementation of FP32Vec16(const BF16Vec32&, int) for ARM
// in csrc/cpu/cpu_types_arm.hpp
explicit FP32Vec16(const BF16Vec32& v, int upper) {
    // Convert BF16 register half-by-half to FP32
    // Implementation details omitted for brevity
}

Notes

The provided solutions assume that the missing constructor is the primary cause of the compiler errors. Additional issues may arise during implementation, and thorough testing is recommended to ensure the fix works as expected.

Recommendation

Apply workaround by adding an ARM stub for FP32Vec16(const BF16Vec32&, int) in csrc/cpu/cpu_types_arm.hpp, as this approach keeps the FP8 attention path available on ARM.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#container setup #orchestration issue #cache issue #memory leak #API versioning

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Bug]: macOS arm64 build broken — `cpu_attn_vec.hpp` calls `FP32Vec16(const BF16Vec32&, int)` constructor that doesn't exist on ARM (regression from #39445) [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

PR fix notes

PR #41387: [Fix] Add missing stubs from cpu fp8 attention changes

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

Code Example

Your current environment

🐛 Describe the bug

Root cause

Reproduction

Suggested fix

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Bug]: macOS arm64 build broken — `cpu_attn_vec.hpp` calls `FP32Vec16(const BF16Vec32&, int)` constructor that doesn't exist on ARM (regression from #39445) [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

PR fix notes

PR #41387: [Fix] Add missing stubs from cpu fp8 attention changes

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

Code Example

Your current environment

🐛 Describe the bug

Root cause

Reproduction

Suggested fix

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING