vllm - ✅(Solved) Fix [Bug]: _C_stable_libtorch fails to build: const& references violate stable ABI trivially_copyable requirement [1 pull requests, 1 comments, 2 participants]

vllm2026-03-28 05:27:48

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#38420•Fetched 2026-04-08 01:41:31

View on GitHub

Comments

Participants

Timeline

Reactions

Author

gbanyan

Participants

gbanyan

mikaylagawarecki

Timeline (top)

referenced ×2commented ×1cross-referenced ×1

Error Message

torch/csrc/stable/stableivalue_conversions.h:450:24: error: static assertion failed torch/csrc/stable/stableivalue_conversions.h:457:9: error: non-static data member in a union may not have reference type 'const torch::stable::Tensor&'

Root Cause

csrc/libtorch_stable/ops.h uses const torch::stable::Tensor& (pass-by-reference) for all function signatures. The stable ABI requires trivially copyable types and uses memcpy across the C-shim boundary. Reference types are not trivially copyable.

Current signatures (all broken):

torch::stable::Tensor permute_cols(torch::stable::Tensor const& A,
                                   torch::stable::Tensor const& perm);
void per_token_group_quant_fp8(const torch::stable::Tensor& input,
                               torch::stable::Tensor& output_q,
                               torch::stable::Tensor& output_s, ...);
// Same for per_token_group_quant_8bit_packed, per_token_group_quant_int8

Fix Action

Fix

Change all const torch::stable::Tensor& to torch::stable::Tensor (pass-by-value), and torch::stable::Tensor& output params to torch::stable::Tensor. Update corresponding torch_bindings.cpp signatures to match.

This was previously identified in PR #37744 but closed prematurely ("Seems like not need anymore"). The bug persists on current main.

PR fix notes

PR #38421: -[Bugfix] Fix stable ABI build: pass torch::stable::Tensor by value

Repository: vllm-project/vllm
Author: anantha119
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/38421

Description (problem / solution / changelog)

The stable ABI requires copyable types across the C-shim boundary, but all libtorch_stable function signatures used const references (const torch::stable::Tensor&) which are not copyable and cause static_assert failures during compilation.

Change all torch::stable::Tensor reference parameters to pass-by-value. This is safe because torch::stable::Tensor is a lightweight handle; copying it does not copy the underlying tensor data.

Fixes #38420

Purpose

Fix _C_stable_libtorch compilation failure caused by const torch::stable::Tensor& references violating the stable ABI trivially_copyable requirement. Without this fix, permute_cols and per_token_group_fp8_quant ops are unavailable, blocking MTP speculative decoding with NVFP4 quantization on platforms like DGX Spark (SM121). Previously identified in #37744 but closed prematurely.

Test Plan

Build vLLM from source with PyTorch 2.10+ and verify _C_stable_libtorch compiles without errors
Verify existing tests for per_token_group_fp8_quant, permute_cols, and per_token_group_quant_int8 still pass

Test Result

Built successfully from source with:

PyTorch 2.10.0+cu128
CUDA 12.8
GCC 12.2.0
NVIDIA L40S (SM89)
_C_stable_libtorch compiled without errors

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

</details>

Changed files

csrc/libtorch_stable/ops.h (modified, +11/-11)
csrc/libtorch_stable/permute_cols.cu (modified, +2/-2)
csrc/libtorch_stable/quantization/w8a8/fp8/per_token_group_quant.cu (modified, +9/-9)
csrc/libtorch_stable/quantization/w8a8/int8/per_token_group_quant.cu (modified, +3/-3)
csrc/libtorch_stable/quantization/w8a8/per_token_group_quant_8bit.h (modified, +3/-3)

Code Example

torch/csrc/stable/stableivalue_conversions.h:450:24: error: static assertion failed
torch/csrc/stable/stableivalue_conversions.h:457:9: error: non-static data member in a union may not have reference type 'const torch::stable::Tensor&'

---

torch::stable::Tensor permute_cols(torch::stable::Tensor const& A,
                                   torch::stable::Tensor const& perm);
void per_token_group_quant_fp8(const torch::stable::Tensor& input,
                               torch::stable::Tensor& output_q,
                               torch::stable::Tensor& output_s, ...);
// Same for per_token_group_quant_8bit_packed, per_token_group_quant_int8

RAW_BUFFERClick to expand / collapse

Your current environment

vLLM: main branch (commit 0e9358c11, 2026-03-27)
PyTorch: NGC nvcr.io/nvidia/pytorch:26.01-py3 (PyTorch 2.10.0a0)
Platform: aarch64 (NVIDIA DGX Spark GB10, SM121)
CUDA: 13.1
Build: TORCH_CUDA_ARCH_LIST="12.1a"

Bug description

_C_stable_libtorch fails to compile with:

torch/csrc/stable/stableivalue_conversions.h:450:24: error: static assertion failed
torch/csrc/stable/stableivalue_conversions.h:457:9: error: non-static data member in a union may not have reference type 'const torch::stable::Tensor&'

Root cause

Current signatures (all broken):

torch::stable::Tensor permute_cols(torch::stable::Tensor const& A,
                                   torch::stable::Tensor const& perm);
void per_token_group_quant_fp8(const torch::stable::Tensor& input,
                               torch::stable::Tensor& output_q,
                               torch::stable::Tensor& output_s, ...);
// Same for per_token_group_quant_8bit_packed, per_token_group_quant_int8

Fix

This was previously identified in PR #37744 but closed prematurely ("Seems like not need anymore"). The bug persists on current main.

Impact

Without _C_stable_libtorch, the ops per_token_group_fp8_quant and permute_cols are unavailable. This blocks MTP speculative decoding with NVFP4 quantization on DGX Spark (SM121), since the MTP code path requires per_token_group_fp8_quant.

Workaround: remove _C_stable_libtorch from setup.py entirely, but this disables MTP + NVFP4 combination.

Tested on

Single DGX Spark GB10 (SM121, 128GB UMA), building vLLM from main + PR #38126 (merged).

extent analysis

Fix Plan

To resolve the compilation error, we need to modify the function signatures in csrc/libtorch_stable/ops.h to use pass-by-value for input parameters and return output parameters instead of passing by reference.

Here are the steps:

Update permute_cols function signature:

torch::stable::Tensor permute_cols(torch::stable::Tensor A, torch::stable::Tensor perm);

* Update `per_token_group_quant_fp8` function signature:
  ```cpp
torch::stable::Tensor per_token_group_quant_fp8(torch::stable::Tensor input, torch::stable::Tensor output_q, torch::stable::Tensor output_s, ...);

Update per_token_group_quant_8bit_packed and per_token_group_quant_int8 function signatures similarly.
Update corresponding torch_bindings.cpp signatures to match the new function signatures in csrc/libtorch_stable/ops.h.

Verification

To verify that the fix worked, rebuild the _C_stable_libtorch module and check that it compiles without errors. Then, test the per_token_group_fp8_quant and permute_cols ops to ensure they are working as expected.

Extra Tips

Make sure to update all relevant function signatures and corresponding bindings to maintain consistency and avoid further compilation errors.
After applying the fix, re-enable _C_stable_libtorch in setup.py to restore the MTP + NVFP4 combination functionality.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#prompt template #agent execution #callback error #memory management #API rate limit

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Bug]: _C_stable_libtorch fails to build: const& references violate stable ABI trivially_copyable requirement [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix

PR fix notes

PR #38421: -[Bugfix] Fix stable ABI build: pass torch::stable::Tensor by value

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

Code Example

Your current environment

Bug description

Root cause

Fix

Impact

Tested on

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Bug]: _C_stable_libtorch fails to build: const& references violate stable ABI trivially_copyable requirement [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix

PR fix notes

PR #38421: -[Bugfix] Fix stable ABI build: pass torch::stable::Tensor by value

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

Code Example

Your current environment

Bug description

Root cause

Fix

Impact

Tested on

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING