pytorch - 💡(How to fix) Fix [vllm] [2.12 regression][AOTAutograd] "Cannot access data pointer of Tensor (FakeTensor/FunctionalTensor)" during cache save

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Under torch 2.12.0, vLLM's Quantization suite fails to compile with torch.compile because AOTAutogradCache.save trips:

RuntimeError: Cannot access data pointer of Tensor (e.g. FakeTensor, FunctionalTensor). If you're using torch.compile/export/fx, it is likely that we are erroneously tracing into a custom kernel. To fix this, please wrap the custom kernel into an opaque custom op.

Preceded by:

W torch/_functorch/_aot_autograd/autograd_cache.py:1224] AOTAutograd cache unable to serialize compiled graph: Cannot access data pointer of Tensor (...)

Same tests pass on torch 2.11.0. Blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Root Cause

Under torch 2.12.0, vLLM's Quantization suite fails to compile with torch.compile because AOTAutogradCache.save trips:

RAW_BUFFERClick to expand / collapse

Summary

Under torch 2.12.0, vLLM's Quantization suite fails to compile with torch.compile because AOTAutogradCache.save trips:

RuntimeError: Cannot access data pointer of Tensor (e.g. FakeTensor, FunctionalTensor). If you're using torch.compile/export/fx, it is likely that we are erroneously tracing into a custom kernel. To fix this, please wrap the custom kernel into an opaque custom op.

Preceded by:

W torch/_functorch/_aot_autograd/autograd_cache.py:1224] AOTAutograd cache unable to serialize compiled graph: Cannot access data pointer of Tensor (...)

Same tests pass on torch 2.11.0. Blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Environment

  • torch: 2.12.0+cu130
  • triton: 3.7.0
  • CUDA: 13.0
  • Python: 3.12
  • GPU: H100

Question / diagnosis

  • Did 2.12 start routing more graphs through AOTAutogradCache.save, or tighten what's pickleable?
  • The warning says the cache write fails, but the resulting RuntimeError propagates out and breaks EngineCore init — is that intentional? In 2.11 a cache-save failure was a silent warning.
  • Is there a supported opt-out (env var) to treat cache-save failures as bypass rather than hard fail?

Affected tests

Quantization suite (1 failed out of 261 passed, 27 skipped — 2017s). Top-of-trace is engine core init on the suite's compile warm-up.

(A secondary failure in the same log — AssertionError: Hidden size mismatch 2048 != 1024 — appears later and may be unrelated; will split if so.)

Links

cc @chauhang @penguinwu @bdhirsh @bobrenjc93 @aorenste

extent analysis

TL;DR

The most likely fix is to wrap the custom kernel into an opaque custom op to prevent erroneous tracing into a custom kernel.

Guidance

  • Investigate the change in behavior between torch 2.11.0 and 2.12.0 to determine if more graphs are being routed through AOTAutogradCache.save or if the pickleability checks have been tightened.
  • Check if there is a supported opt-out (env var) to treat cache-save failures as bypass rather than hard fail, which could potentially mitigate the issue.
  • Review the Quantization suite tests to identify the specific custom kernel causing the issue and consider wrapping it into an opaque custom op as suggested by the error message.
  • Verify if the secondary failure (AssertionError: Hidden size mismatch 2048 != 1024) is unrelated to the primary issue and split the issues if necessary.

Example

No code snippet is provided as the issue does not contain sufficient information to generate a specific example.

Notes

The issue seems to be specific to the torch 2.12.0 version, and the same tests pass on torch 2.11.0. The error message suggests that wrapping the custom kernel into an opaque custom op might fix the issue, but further investigation is needed to determine the root cause.

Recommendation

Apply workaround: The error message suggests that wrapping the custom kernel into an opaque custom op might fix the issue, and this approach is worth exploring before considering an upgrade or other changes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING