pytorch - 💡(How to fix) Fix [vllm] [2.12 regression][Inductor] prims.convert_element_type receives MetaProxy instead of Tensor in FP8 rms_norm fusion

StepCodex · 2026-04-20T20:33:59Z

[pytorch] Under torch 2.12.0 with backend='inductor' , a graph produced by vLLM's FP8 rms norm fusion pass fails during compilation with: RuntimeError: prims::… Under torch 2.12.0 with `backend='inductor'`, a graph produced by vLLM's FP8 rms_norm fusion pass fails during compilation with: > `RuntimeError: prims::convert_element_type() Expected a value of type 'Tensor' for argument 'a' but instead found type 'MetaProxy'. Value: MetaProxy(arg4_1). Cast error details: Unable to cast MetaProxy(arg4_1) to Tensor` Same graph compiles on torch 2.11.0. This is one of the regressions blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077). ## Summary Under torch 2.12.0 with `backend='inductor'`, a graph produced by vLLM's FP8 rms_norm fusion pass fails during compilation with: > `RuntimeError: prims::convert_element_type() Expected a value of type 'Tensor' for argument 'a' but instead found type 'MetaProxy'. Value: MetaProxy(arg4_1). Cast error details: Unable to cast MetaProxy(arg4_1) to Tensor` Same graph compiles on torch 2.11.0. This is one of the regressions blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077). ## Environment - `torch`: 2.12.0+cu130 (test channel) - `triton`: 3.7.0 - CUDA: 13.0 - Python: 3.12 - GPU: reproduces on both H100 and B200 - Affected attention backends: `FLASHINFER`, `TRITON_ATTN` ## Repro - Model: `nvidia/Llama-4-Scout-17B-16E-Instruct-FP8` - Fusion pass: `quant_fp8 + rms_norm` - Compile mode: `inductor_partition` - Reproduces with and without TP. Failing vLLM tests (all same root error): | test | GPU | |---|---| | `tests/compile/fusions_e2e/test_tp1_quant.py::test_tp1_fp8_fusions[inductor_partition--quant_fp8,-rms_norm-6-FLASHINFER-...]` | B200 | | `tests/compile/fusions_e2e/test_tp1_quant.py::test_tp1_fp8_fusions[inductor_partition--quant_fp8,-rms_norm-6-TRITON_ATTN-...]` | H100 | | `tests/compile/fusions_e2e/test_tp2_ar_rms.py::test_tp2_ar_rms_fp8_fusions[inductor_partition--quant_fp8,-rms_norm-4-FLASHINFER-...]` | B200 | | `tests/compile/fusions_e2e/test_tp2_ar_rms.py::test_tp2_ar_rms_fp8_fusions[inductor_partition--quant_fp8,-rms_norm-4-TRITON_ATTN-...]` | H100 | ## Error + abbreviated traceback ``` RuntimeError: Worker failed with error 'backend='inductor' raised: RuntimeError: prims::convert_element_type() Expected a value of type 'Tensor' for argument 'a' but instead found type 'MetaProxy'. Value: MetaProxy(arg4_1) Cast error details: Unable to cast MetaProxy(arg4_1) to Tensor ``` The error originates during inductor lowering when calling `prims.convert_element_type` on a fx node whose value is wrapped in a `MetaProxy` rather than unwrapped to the underlying `Tensor`. ## Question / diagnosis Did torch 2.12 change how `MetaProxy` is unwrapped (or when it's preserved) during inductor graph lowering? A `MetaProxy` leaking into a `prims.*` call that expects a `Tensor` suggests either: - a missing unwrap in the fusion pass path, or - stricter type-checking in `prims.convert_element_type` on 2.12 that rejects `MetaProxy` where 2.11 quietly accepted it. Happy to provide a minimal inductor-only repro extracted from vLLM's fusion graph if helpful. ## Links - vLLM PR: https://github.com/vllm-project/vllm/pull/40077 - Failing Buildkite build: https://buildkite.com/vllm/ci/builds/62138 - Example failed job (Fusion E2E TP2 B200): https://buildkite.com/vllm/ci/builds/62138#019dab36-870f-4089-9996-6c430835bdf8 - Umbrella: #180899 cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

pytorch2026-04-20 20:33:59

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Under torch 2.12.0 with backend='inductor', a graph produced by vLLM's FP8 rms_norm fusion pass fails during compilation with:

RuntimeError: prims::convert_element_type() Expected a value of type 'Tensor' for argument 'a' but instead found type 'MetaProxy'. Value: MetaProxy(arg4_1). Cast error details: Unable to cast MetaProxy(arg4_1) to Tensor

Same graph compiles on torch 2.11.0. This is one of the regressions blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Error Message

RuntimeError: Worker failed with error 'backend='inductor' raised: RuntimeError: prims::convert_element_type() Expected a value of type 'Tensor' for argument 'a' but instead found type 'MetaProxy'. Value: MetaProxy(arg4_1) Cast error details: Unable to cast MetaProxy(arg4_1) to Tensor

Root Cause

Under torch 2.12.0 with backend='inductor', a graph produced by vLLM's FP8 rms_norm fusion pass fails during compilation with:

RuntimeError: prims::convert_element_type() Expected a value of type 'Tensor' for argument 'a' but instead found type 'MetaProxy'. Value: MetaProxy(arg4_1). Cast error details: Unable to cast MetaProxy(arg4_1) to Tensor

Same graph compiles on torch 2.11.0. This is one of the regressions blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Code Example

RuntimeError: Worker failed with error 'backend='inductor' raised:
RuntimeError: prims::convert_element_type() Expected a value of type 'Tensor' for argument 'a' but instead found type 'MetaProxy'.
Value: MetaProxy(arg4_1)
Cast error details: Unable to cast MetaProxy(arg4_1) to Tensor

RAW_BUFFERClick to expand / collapse

Summary

Under torch 2.12.0 with backend='inductor', a graph produced by vLLM's FP8 rms_norm fusion pass fails during compilation with:

RuntimeError: prims::convert_element_type() Expected a value of type 'Tensor' for argument 'a' but instead found type 'MetaProxy'. Value: MetaProxy(arg4_1). Cast error details: Unable to cast MetaProxy(arg4_1) to Tensor

Same graph compiles on torch 2.11.0. This is one of the regressions blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Environment

torch: 2.12.0+cu130 (test channel)
triton: 3.7.0
CUDA: 13.0
Python: 3.12
GPU: reproduces on both H100 and B200
Affected attention backends: FLASHINFER, TRITON_ATTN

Repro

Model: nvidia/Llama-4-Scout-17B-16E-Instruct-FP8
Fusion pass: quant_fp8 + rms_norm
Compile mode: inductor_partition
Reproduces with and without TP.

Failing vLLM tests (all same root error):

test	GPU
`tests/compile/fusions_e2e/test_tp1_quant.py::test_tp1_fp8_fusions[inductor_partition--quant_fp8,-rms_norm-6-FLASHINFER-...]`	B200
`tests/compile/fusions_e2e/test_tp1_quant.py::test_tp1_fp8_fusions[inductor_partition--quant_fp8,-rms_norm-6-TRITON_ATTN-...]`	H100
`tests/compile/fusions_e2e/test_tp2_ar_rms.py::test_tp2_ar_rms_fp8_fusions[inductor_partition--quant_fp8,-rms_norm-4-FLASHINFER-...]`	B200
`tests/compile/fusions_e2e/test_tp2_ar_rms.py::test_tp2_ar_rms_fp8_fusions[inductor_partition--quant_fp8,-rms_norm-4-TRITON_ATTN-...]`	H100

Error + abbreviated traceback

RuntimeError: Worker failed with error 'backend='inductor' raised:
RuntimeError: prims::convert_element_type() Expected a value of type 'Tensor' for argument 'a' but instead found type 'MetaProxy'.
Value: MetaProxy(arg4_1)
Cast error details: Unable to cast MetaProxy(arg4_1) to Tensor

The error originates during inductor lowering when calling prims.convert_element_type on a fx node whose value is wrapped in a MetaProxy rather than unwrapped to the underlying Tensor.

Question / diagnosis

Did torch 2.12 change how MetaProxy is unwrapped (or when it's preserved) during inductor graph lowering? A MetaProxy leaking into a prims.* call that expects a Tensor suggests either:

a missing unwrap in the fusion pass path, or
stricter type-checking in prims.convert_element_type on 2.12 that rejects MetaProxy where 2.11 quietly accepted it.

Happy to provide a minimal inductor-only repro extracted from vLLM's fusion graph if helpful.

extent analysis

TL;DR

The most likely fix for the RuntimeError caused by MetaProxy not being unwrapped to a Tensor is to modify the fusion pass to properly unwrap MetaProxy values or to update the prims.convert_element_type function to handle MetaProxy types.

Guidance

Investigate the changes in torch 2.12 regarding MetaProxy unwrapping during inductor graph lowering to determine if stricter type-checking is the cause of the error.
Review the fusion pass code to ensure that MetaProxy values are properly unwrapped before being passed to prims.convert_element_type.
Consider creating a minimal inductor-only repro to isolate the issue and test potential fixes.
Check the torch 2.12 documentation and release notes for any information on changes to MetaProxy handling or prims.convert_element_type behavior.

Example

No code example is provided as the issue does not include sufficient code context.

Notes

The fix may require updates to the fusion pass code or the prims.convert_element_type function, and may depend on the specific changes made in torch 2.12.

Recommendation

Apply a workaround by modifying the fusion pass to properly unwrap MetaProxy values, as this is a more targeted and potentially quicker fix than updating the prims.convert_element_type function.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#tensor shape #autograd error #model save/load #optimization #mixed precision

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix [vllm] [2.12 regression][Inductor] prims.convert_element_type receives MetaProxy instead of Tensor in FP8 rms_norm fusion

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Summary

Environment

Repro

Error + abbreviated traceback

Question / diagnosis

Links

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix [vllm] [2.12 regression][Inductor] prims.convert_element_type receives MetaProxy instead of Tensor in FP8 rms_norm fusion

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Summary

Environment

Repro

Error + abbreviated traceback

Question / diagnosis

Links

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING