vllm - ✅(Solved) Fix [Bug]: torch.opcheck fails for `_C.rms_norm_per_block_quant` [2 pull requests, 3 comments, 2 participants]

ProExpertProg · 2026-03-10T17:47:07Z

[vllm] PR 36766: fix test : skip test schema in opcheck for rms norm per block quant … - Repository: vllm-project/vllm - Author: mahendrarathore1742 - State: c… # PR #36766: fix(test): skip test_schema in opcheck for rms_norm_per_block_quant (… - Repository: vllm-project/vllm - Author: mahendrarathore1742 - State: closed | merged: False - Link: https://github.com/vllm-project/vllm/pull/36766 ## Description (problem / solution / changelog) …#36688) opcheck's test_schema falsely reports the immutable weight tensor as mutated due to CUDA memory-allocator reuse when it internally clones the arguments. The kernel only reads weight through const pointers, and the original tensor stays intact. Fix: - Add opcheck call for rms_norm_per_block_quant (block-quant path) with correctly shaped scales tensor. - Exclude test_schema from that opcheck to work around the false positive; test_autograd_registration and test_faketensor still run. - Move the existing rms_norm_dynamic_per_token_quant opcheck into the else branch so it only runs for the per-token path. ## Changed files - `tests/kernels/core/test_fused_quant_layernorm.py` (modified, +59/-8) --- # PR #36779: [Bugfix] opcheck false mutation error in rms_norm_per_block_quant (#36688) - Repository: vllm-project/vllm - Author: KrxGu - State: closed | merged: True - Link: https://github.com/vllm-project/vllm/pull/36779 ## Description (problem / solution / changelog) Fixes #36688. The `opcheck` call in `test_rms_norm` was allocating `scales` with shape `(num_tokens, 1)` and passing it to `rms_norm_per_block_quant`. The blockwise kernel writes `hidden_size / group_size` scale values per token, so with 8 groups the buffer was 8× too small. The out-of-bounds writes landed in the adjacent cloned `weight` tensor under opcheck's memory layout, which is why opcheck reported `weight` as mutated even though nothing in the kernel intentionally touches it. The schema and C++ declaration are correct - `weight` is and should remain immutable. **Changes:** - Allocate `block_scales` with the correct shape `(num_tokens, hidden_size // group_size)` in the test, and add the missing `opcheck` call for the blockwise path - Add a `TORCH_CHECK` in `rms_norm_per_block_quant` validating `scales.numel() >= num_tokens * num_groups` so callers get a clear error instead of silent OOB writes **Testing:** Ran 16 blockwise int8 opcheck cases: group_size 64 and 128, with and without residual, 1 and 2048 tokens-> all pass. The pre-existing FP8 failures in this file reproduce identically on upstream main before any edits (unrelated binary/source API mismatch). ## Changed files - `csrc/quantization/fused_kernels/fused_layernorm_dynamic_per_token_quant.cu` (modified, +9/-0) - `tests/kernels/core/test_fused_quant_layernorm.py` (modified, +10/-9) ## Fixed - Fixed by PR: fix(test): skip test_schema in opcheck for rms_norm_per_block_quant (… (https://github.com/vllm-project/vllm/pull/36766) - Fixed by PR: [Bugfix] opcheck false mutation error in rms_norm_per_block_quant (#36688) (https://github.com/vllm-project/vllm/pull/36779) ### Your current environment The output of python collect_env.py ```text Your output of `python collect_env.py` here ``` ### 🐛 Describe the bug In the unit test for `torch.ops._C.rms_norm_per_block_quant` custom kernel, for some reason opcheck fails because it thinks the weight tensor got mutated. A closer look reveals a weird issue: the cloned weight arg is the one that gets modified, and the original weight arg stays intact. I could not find a memory issue, I manually confirmed the original weight stays intact when not using opcheck, and E2E evals look good. ``` torch.testing._internal.optests.generate_tests.OpCheckError: opcheck(op, ...): test_schema failed with Argument weight is not defined as mutable but was mutated (scroll up for stack trace) ``` https://github.com/vllm-project/vllm/blob/0ebf4e969b43d99c240fd085703ea1ed97897499/tests/kernels/core/test_fused_quant_layernorm.py#L291-L304 ### Before submitting a new issue... - [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

vllm2026-03-10 17:47:07

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#36688•Fetched 2026-04-08 00:35:25

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Assignees

Timeline (top)

referenced ×80commented ×3cross-referenced ×2labeled ×2

Error Message

torch.testing._internal.optests.generate_tests.OpCheckError: opcheck(op, ...): test_schema failed with Argument weight is not defined as mutable but was mutated (scroll up for stack trace)

Root Cause

In the unit test for torch.ops._C.rms_norm_per_block_quant custom kernel, for some reason opcheck fails because it thinks the weight tensor got mutated. A closer look reveals a weird issue: the cloned weight arg is the one that gets modified, and the original weight arg stays intact. I could not find a memory issue, I manually confirmed the original weight stays intact when not using opcheck, and E2E evals look good.

Fix Action

Fixed

Fixed by PR: fix(test): skip test_schema in opcheck for rms_norm_per_block_quant (… (https://github.com/vllm-project/vllm/pull/36766)
Fixed by PR: [Bugfix] opcheck false mutation error in rms_norm_per_block_quant (#36688) (https://github.com/vllm-project/vllm/pull/36779)

PR fix notes

PR #36766: fix(test): skip test_schema in opcheck for rms_norm_per_block_quant (…

Repository: vllm-project/vllm
Author: mahendrarathore1742
State: closed | merged: False
Link: https://github.com/vllm-project/vllm/pull/36766

Description (problem / solution / changelog)

…#36688)

opcheck's test_schema falsely reports the immutable weight tensor as mutated due to CUDA memory-allocator reuse when it internally clones the arguments. The kernel only reads weight through const pointers, and the original tensor stays intact.

Fix:

Add opcheck call for rms_norm_per_block_quant (block-quant path) with correctly shaped scales tensor.
Exclude test_schema from that opcheck to work around the false positive; test_autograd_registration and test_faketensor still run.
Move the existing rms_norm_dynamic_per_token_quant opcheck into the else branch so it only runs for the per-token path.

Changed files

tests/kernels/core/test_fused_quant_layernorm.py (modified, +59/-8)

PR #36779: [Bugfix] opcheck false mutation error in rms_norm_per_block_quant (#36688)

Repository: vllm-project/vllm
Author: KrxGu
State: closed | merged: True
Link: https://github.com/vllm-project/vllm/pull/36779

Description (problem / solution / changelog)

Fixes #36688.

The opcheck call in test_rms_norm was allocating scales with shape (num_tokens, 1) and passing it to rms_norm_per_block_quant. The blockwise kernel writes hidden_size / group_size scale values per token, so with 8 groups the buffer was 8× too small. The out-of-bounds writes landed in the adjacent cloned weight tensor under opcheck's memory layout, which is why opcheck reported weight as mutated even though nothing in the kernel intentionally touches it.

The schema and C++ declaration are correct - weight is and should remain immutable.

Changes:

Allocate block_scales with the correct shape (num_tokens, hidden_size // group_size) in the test, and add the missing opcheck call for the blockwise path
Add a TORCH_CHECK in rms_norm_per_block_quant validating scales.numel() >= num_tokens * num_groups so callers get a clear error instead of silent OOB writes

Testing: Ran 16 blockwise int8 opcheck cases: group_size 64 and 128, with and without residual, 1 and 2048 tokens-> all pass. The pre-existing FP8 failures in this file reproduce identically on upstream main before any edits (unrelated binary/source API mismatch). <img width="802" height="264" alt="image" src="https://github.com/user-attachments/assets/b0136b52-11fd-4be8-bda9-f6468f347ee5" />

Changed files

csrc/quantization/fused_kernels/fused_layernorm_dynamic_per_token_quant.cu (modified, +9/-0)
tests/kernels/core/test_fused_quant_layernorm.py (modified, +10/-9)

Code Example

Your output of `python collect_env.py` here

---

torch.testing._internal.optests.generate_tests.OpCheckError: opcheck(op, ...): test_schema failed with Argument weight is not defined as mutable but was mutated (scroll up for stack trace)

RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>

Your output of `python collect_env.py` here

</details>

🐛 Describe the bug

torch.testing._internal.optests.generate_tests.OpCheckError: opcheck(op, ...): test_schema failed with Argument weight is not defined as mutable but was mutated (scroll up for stack trace)

https://github.com/vllm-project/vllm/blob/0ebf4e969b43d99c240fd085703ea1ed97897499/tests/kernels/core/test_fused_quant_layernorm.py#L291-L304

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

The issue can be fixed by ensuring that the weight tensor is not modified in place.

Clone the weight tensor before passing it to the custom kernel to prevent modification of the original tensor.
Use the cloned tensor for the custom kernel operation.

Example code:

import torch

# Assuming weight is the original weight tensor
weight_clone = weight.clone()

# Pass the cloned weight to the custom kernel
torch.ops._C.rms_norm_per_block_quant(weight_clone, ...)

# Verify that the original weight remains unchanged
assert torch.equal(weight, weight_clone)

Alternatively, you can also use the detach() method to create a new tensor that shares the same storage as the original tensor but has its own copy of the storage:

weight_clone = weight.detach().clone()

However, using clone() alone should be sufficient to fix the issue.

Verification

To verify that the fix worked, run the unit test again and check that the OpCheckError is no longer raised. You can also add additional assertions to ensure that the original weight tensor remains unchanged after the custom kernel operation.

Extra Tips

When working with custom kernels, it's essential to ensure that the input tensors are not modified in place to avoid unexpected behavior.
Using clone() or detach().clone() can help prevent modification of the original tensors.
Always verify the correctness of the custom kernel operation by adding additional assertions or tests.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #GPU setup #container setup #orchestration issue #cache issue #memory leak

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Bug]: torch.opcheck fails for `_C.rms_norm_per_block_quant` [2 pull requests, 3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #36766: fix(test): skip test_schema in opcheck for rms_norm_per_block_quant (…

Description (problem / solution / changelog)

Changed files

PR #36779: [Bugfix] opcheck false mutation error in rms_norm_per_block_quant (#36688)

Description (problem / solution / changelog)

Changed files

Code Example

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Bug]: torch.opcheck fails for `_C.rms_norm_per_block_quant` [2 pull requests, 3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #36766: fix(test): skip test_schema in opcheck for rms_norm_per_block_quant (…

Description (problem / solution / changelog)

Changed files

PR #36779: [Bugfix] opcheck false mutation error in rms_norm_per_block_quant (#36688)

Description (problem / solution / changelog)

Changed files

Code Example

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING