vllm - ✅(Solved) Fix [Bug]: _C_stable_libtorch fails to build: const& references violate stable ABI trivially_copyable requirement [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38420Fetched 2026-04-08 01:41:31
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Author
Timeline (top)
referenced ×2commented ×1cross-referenced ×1

Error Message

torch/csrc/stable/stableivalue_conversions.h:450:24: error: static assertion failed torch/csrc/stable/stableivalue_conversions.h:457:9: error: non-static data member in a union may not have reference type 'const torch::stable::Tensor&'

Root Cause

csrc/libtorch_stable/ops.h uses const torch::stable::Tensor& (pass-by-reference) for all function signatures. The stable ABI requires trivially copyable types and uses memcpy across the C-shim boundary. Reference types are not trivially copyable.

Current signatures (all broken):

torch::stable::Tensor permute_cols(torch::stable::Tensor const& A,
                                   torch::stable::Tensor const& perm);
void per_token_group_quant_fp8(const torch::stable::Tensor& input,
                               torch::stable::Tensor& output_q,
                               torch::stable::Tensor& output_s, ...);
// Same for per_token_group_quant_8bit_packed, per_token_group_quant_int8

Fix Action

Fix

Change all const torch::stable::Tensor& to torch::stable::Tensor (pass-by-value), and torch::stable::Tensor& output params to torch::stable::Tensor. Update corresponding torch_bindings.cpp signatures to match.

This was previously identified in PR #37744 but closed prematurely ("Seems like not need anymore"). The bug persists on current main.

PR fix notes

PR #38421: -[Bugfix] Fix stable ABI build: pass torch::stable::Tensor by value

Description (problem / solution / changelog)

The stable ABI requires copyable types across the C-shim boundary, but all libtorch_stable function signatures used const references (const torch::stable::Tensor&) which are not copyable and cause static_assert failures during compilation.

Change all torch::stable::Tensor reference parameters to pass-by-value. This is safe because torch::stable::Tensor is a lightweight handle; copying it does not copy the underlying tensor data.

Fixes #38420

Purpose

Fix _C_stable_libtorch compilation failure caused by const torch::stable::Tensor& references violating the stable ABI trivially_copyable requirement. Without this fix, permute_cols and per_token_group_fp8_quant ops are unavailable, blocking MTP speculative decoding with NVFP4 quantization on platforms like DGX Spark (SM121). Previously identified in #37744 but closed prematurely.

Test Plan

  • Build vLLM from source with PyTorch 2.10+ and verify _C_stable_libtorch compiles without errors
  • Verify existing tests for per_token_group_fp8_quant, permute_cols, and per_token_group_quant_int8 still pass

Test Result

Built successfully from source with:

  • PyTorch 2.10.0+cu128
  • CUDA 12.8
  • GCC 12.2.0
  • NVIDIA L40S (SM89)
  • _C_stable_libtorch compiled without errors

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • csrc/libtorch_stable/ops.h (modified, +11/-11)
  • csrc/libtorch_stable/permute_cols.cu (modified, +2/-2)
  • csrc/libtorch_stable/quantization/w8a8/fp8/per_token_group_quant.cu (modified, +9/-9)
  • csrc/libtorch_stable/quantization/w8a8/int8/per_token_group_quant.cu (modified, +3/-3)
  • csrc/libtorch_stable/quantization/w8a8/per_token_group_quant_8bit.h (modified, +3/-3)

Code Example

torch/csrc/stable/stableivalue_conversions.h:450:24: error: static assertion failed
torch/csrc/stable/stableivalue_conversions.h:457:9: error: non-static data member in a union may not have reference type 'const torch::stable::Tensor&'

---

torch::stable::Tensor permute_cols(torch::stable::Tensor const& A,
                                   torch::stable::Tensor const& perm);
void per_token_group_quant_fp8(const torch::stable::Tensor& input,
                               torch::stable::Tensor& output_q,
                               torch::stable::Tensor& output_s, ...);
// Same for per_token_group_quant_8bit_packed, per_token_group_quant_int8
RAW_BUFFERClick to expand / collapse

Your current environment

  • vLLM: main branch (commit 0e9358c11, 2026-03-27)
  • PyTorch: NGC nvcr.io/nvidia/pytorch:26.01-py3 (PyTorch 2.10.0a0)
  • Platform: aarch64 (NVIDIA DGX Spark GB10, SM121)
  • CUDA: 13.1
  • Build: TORCH_CUDA_ARCH_LIST="12.1a"

Bug description

_C_stable_libtorch fails to compile with:

torch/csrc/stable/stableivalue_conversions.h:450:24: error: static assertion failed
torch/csrc/stable/stableivalue_conversions.h:457:9: error: non-static data member in a union may not have reference type 'const torch::stable::Tensor&'

Root cause

csrc/libtorch_stable/ops.h uses const torch::stable::Tensor& (pass-by-reference) for all function signatures. The stable ABI requires trivially copyable types and uses memcpy across the C-shim boundary. Reference types are not trivially copyable.

Current signatures (all broken):

torch::stable::Tensor permute_cols(torch::stable::Tensor const& A,
                                   torch::stable::Tensor const& perm);
void per_token_group_quant_fp8(const torch::stable::Tensor& input,
                               torch::stable::Tensor& output_q,
                               torch::stable::Tensor& output_s, ...);
// Same for per_token_group_quant_8bit_packed, per_token_group_quant_int8

Fix

Change all const torch::stable::Tensor& to torch::stable::Tensor (pass-by-value), and torch::stable::Tensor& output params to torch::stable::Tensor. Update corresponding torch_bindings.cpp signatures to match.

This was previously identified in PR #37744 but closed prematurely ("Seems like not need anymore"). The bug persists on current main.

Impact

Without _C_stable_libtorch, the ops per_token_group_fp8_quant and permute_cols are unavailable. This blocks MTP speculative decoding with NVFP4 quantization on DGX Spark (SM121), since the MTP code path requires per_token_group_fp8_quant.

Workaround: remove _C_stable_libtorch from setup.py entirely, but this disables MTP + NVFP4 combination.

Tested on

Single DGX Spark GB10 (SM121, 128GB UMA), building vLLM from main + PR #38126 (merged).

extent analysis

Fix Plan

To resolve the compilation error, we need to modify the function signatures in csrc/libtorch_stable/ops.h to use pass-by-value for input parameters and return output parameters instead of passing by reference.

Here are the steps:

  • Update permute_cols function signature:

torch::stable::Tensor permute_cols(torch::stable::Tensor A, torch::stable::Tensor perm);

* Update `per_token_group_quant_fp8` function signature:
  ```cpp
torch::stable::Tensor per_token_group_quant_fp8(torch::stable::Tensor input, torch::stable::Tensor output_q, torch::stable::Tensor output_s, ...);
  • Update per_token_group_quant_8bit_packed and per_token_group_quant_int8 function signatures similarly.
  • Update corresponding torch_bindings.cpp signatures to match the new function signatures in csrc/libtorch_stable/ops.h.

Verification

To verify that the fix worked, rebuild the _C_stable_libtorch module and check that it compiles without errors. Then, test the per_token_group_fp8_quant and permute_cols ops to ensure they are working as expected.

Extra Tips

  • Make sure to update all relevant function signatures and corresponding bindings to maintain consistency and avoid further compilation errors.
  • After applying the fix, re-enable _C_stable_libtorch in setup.py to restore the MTP + NVFP4 combination functionality.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Bug]: _C_stable_libtorch fails to build: const& references violate stable ABI trivially_copyable requirement [1 pull requests, 1 comments, 2 participants]