pytorch - 💡(How to fix) Fix [vllm] [2.12 regression][Inductor] prims.convert_element_type receives MetaProxy instead of Tensor in FP8 rms_norm fusion

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Under torch 2.12.0 with backend='inductor', a graph produced by vLLM's FP8 rms_norm fusion pass fails during compilation with:

RuntimeError: prims::convert_element_type() Expected a value of type 'Tensor' for argument 'a' but instead found type 'MetaProxy'. Value: MetaProxy(arg4_1). Cast error details: Unable to cast MetaProxy(arg4_1) to Tensor

Same graph compiles on torch 2.11.0. This is one of the regressions blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Error Message

RuntimeError: Worker failed with error 'backend='inductor' raised: RuntimeError: prims::convert_element_type() Expected a value of type 'Tensor' for argument 'a' but instead found type 'MetaProxy'. Value: MetaProxy(arg4_1) Cast error details: Unable to cast MetaProxy(arg4_1) to Tensor

Root Cause

Under torch 2.12.0 with backend='inductor', a graph produced by vLLM's FP8 rms_norm fusion pass fails during compilation with:

RuntimeError: prims::convert_element_type() Expected a value of type 'Tensor' for argument 'a' but instead found type 'MetaProxy'. Value: MetaProxy(arg4_1). Cast error details: Unable to cast MetaProxy(arg4_1) to Tensor

Same graph compiles on torch 2.11.0. This is one of the regressions blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Code Example

RuntimeError: Worker failed with error 'backend='inductor' raised:
RuntimeError: prims::convert_element_type() Expected a value of type 'Tensor' for argument 'a' but instead found type 'MetaProxy'.
Value: MetaProxy(arg4_1)
Cast error details: Unable to cast MetaProxy(arg4_1) to Tensor
RAW_BUFFERClick to expand / collapse

Summary

Under torch 2.12.0 with backend='inductor', a graph produced by vLLM's FP8 rms_norm fusion pass fails during compilation with:

RuntimeError: prims::convert_element_type() Expected a value of type 'Tensor' for argument 'a' but instead found type 'MetaProxy'. Value: MetaProxy(arg4_1). Cast error details: Unable to cast MetaProxy(arg4_1) to Tensor

Same graph compiles on torch 2.11.0. This is one of the regressions blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Environment

  • torch: 2.12.0+cu130 (test channel)
  • triton: 3.7.0
  • CUDA: 13.0
  • Python: 3.12
  • GPU: reproduces on both H100 and B200
  • Affected attention backends: FLASHINFER, TRITON_ATTN

Repro

  • Model: nvidia/Llama-4-Scout-17B-16E-Instruct-FP8
  • Fusion pass: quant_fp8 + rms_norm
  • Compile mode: inductor_partition
  • Reproduces with and without TP.

Failing vLLM tests (all same root error):

testGPU
tests/compile/fusions_e2e/test_tp1_quant.py::test_tp1_fp8_fusions[inductor_partition--quant_fp8,-rms_norm-6-FLASHINFER-...]B200
tests/compile/fusions_e2e/test_tp1_quant.py::test_tp1_fp8_fusions[inductor_partition--quant_fp8,-rms_norm-6-TRITON_ATTN-...]H100
tests/compile/fusions_e2e/test_tp2_ar_rms.py::test_tp2_ar_rms_fp8_fusions[inductor_partition--quant_fp8,-rms_norm-4-FLASHINFER-...]B200
tests/compile/fusions_e2e/test_tp2_ar_rms.py::test_tp2_ar_rms_fp8_fusions[inductor_partition--quant_fp8,-rms_norm-4-TRITON_ATTN-...]H100

Error + abbreviated traceback

RuntimeError: Worker failed with error 'backend='inductor' raised:
RuntimeError: prims::convert_element_type() Expected a value of type 'Tensor' for argument 'a' but instead found type 'MetaProxy'.
Value: MetaProxy(arg4_1)
Cast error details: Unable to cast MetaProxy(arg4_1) to Tensor

The error originates during inductor lowering when calling prims.convert_element_type on a fx node whose value is wrapped in a MetaProxy rather than unwrapped to the underlying Tensor.

Question / diagnosis

Did torch 2.12 change how MetaProxy is unwrapped (or when it's preserved) during inductor graph lowering? A MetaProxy leaking into a prims.* call that expects a Tensor suggests either:

  • a missing unwrap in the fusion pass path, or
  • stricter type-checking in prims.convert_element_type on 2.12 that rejects MetaProxy where 2.11 quietly accepted it.

Happy to provide a minimal inductor-only repro extracted from vLLM's fusion graph if helpful.

Links

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

extent analysis

TL;DR

The most likely fix for the RuntimeError caused by MetaProxy not being unwrapped to a Tensor is to modify the fusion pass to properly unwrap MetaProxy values or to update the prims.convert_element_type function to handle MetaProxy types.

Guidance

  • Investigate the changes in torch 2.12 regarding MetaProxy unwrapping during inductor graph lowering to determine if stricter type-checking is the cause of the error.
  • Review the fusion pass code to ensure that MetaProxy values are properly unwrapped before being passed to prims.convert_element_type.
  • Consider creating a minimal inductor-only repro to isolate the issue and test potential fixes.
  • Check the torch 2.12 documentation and release notes for any information on changes to MetaProxy handling or prims.convert_element_type behavior.

Example

No code example is provided as the issue does not include sufficient code context.

Notes

The fix may require updates to the fusion pass code or the prims.convert_element_type function, and may depend on the specific changes made in torch 2.12.

Recommendation

Apply a workaround by modifying the fusion pass to properly unwrap MetaProxy values, as this is a more targeted and potentially quicker fix than updating the prims.convert_element_type function.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING