pytorch - 💡(How to fix) Fix [vllm] [2.12 regression][AOTAutograd] "Cannot access data pointer of Tensor (FakeTensor/FunctionalTensor)" during cache save

pytorch2026-04-20 20:40:59

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Under torch 2.12.0, vLLM's Quantization suite fails to compile with torch.compile because AOTAutogradCache.save trips:

RuntimeError: Cannot access data pointer of Tensor (e.g. FakeTensor, FunctionalTensor). If you're using torch.compile/export/fx, it is likely that we are erroneously tracing into a custom kernel. To fix this, please wrap the custom kernel into an opaque custom op.

Preceded by:

W torch/_functorch/_aot_autograd/autograd_cache.py:1224] AOTAutograd cache unable to serialize compiled graph: Cannot access data pointer of Tensor (...)

Same tests pass on torch 2.11.0. Blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Root Cause

Under torch 2.12.0, vLLM's Quantization suite fails to compile with torch.compile because AOTAutogradCache.save trips:

RAW_BUFFERClick to expand / collapse

Summary

Under torch 2.12.0, vLLM's Quantization suite fails to compile with torch.compile because AOTAutogradCache.save trips:

RuntimeError: Cannot access data pointer of Tensor (e.g. FakeTensor, FunctionalTensor). If you're using torch.compile/export/fx, it is likely that we are erroneously tracing into a custom kernel. To fix this, please wrap the custom kernel into an opaque custom op.

Preceded by:

W torch/_functorch/_aot_autograd/autograd_cache.py:1224] AOTAutograd cache unable to serialize compiled graph: Cannot access data pointer of Tensor (...)

Same tests pass on torch 2.11.0. Blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Environment

torch: 2.12.0+cu130
triton: 3.7.0
CUDA: 13.0
Python: 3.12
GPU: H100

Question / diagnosis

Did 2.12 start routing more graphs through AOTAutogradCache.save, or tighten what's pickleable?
The warning says the cache write fails, but the resulting RuntimeError propagates out and breaks EngineCore init — is that intentional? In 2.11 a cache-save failure was a silent warning.
Is there a supported opt-out (env var) to treat cache-save failures as bypass rather than hard fail?

Affected tests

Quantization suite (1 failed out of 261 passed, 27 skipped — 2017s). Top-of-trace is engine core init on the suite's compile warm-up.

(A secondary failure in the same log — AssertionError: Hidden size mismatch 2048 != 1024 — appears later and may be unrelated; will split if so.)

extent analysis

TL;DR

The most likely fix is to wrap the custom kernel into an opaque custom op to prevent erroneous tracing into a custom kernel.

Guidance

Investigate the change in behavior between torch 2.11.0 and 2.12.0 to determine if more graphs are being routed through AOTAutogradCache.save or if the pickleability checks have been tightened.
Check if there is a supported opt-out (env var) to treat cache-save failures as bypass rather than hard fail, which could potentially mitigate the issue.
Review the Quantization suite tests to identify the specific custom kernel causing the issue and consider wrapping it into an opaque custom op as suggested by the error message.
Verify if the secondary failure (AssertionError: Hidden size mismatch 2048 != 1024) is unrelated to the primary issue and split the issues if necessary.

Example

No code snippet is provided as the issue does not contain sufficient information to generate a specific example.

Notes

The issue seems to be specific to the torch 2.12.0 version, and the same tests pass on torch 2.11.0. The error message suggests that wrapping the custom kernel into an opaque custom op might fix the issue, but further investigation is needed to determine the root cause.

Recommendation

Apply workaround: The error message suggests that wrapping the custom kernel into an opaque custom op might fix the issue, and this approach is worth exploring before considering an upgrade or other changes.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix [vllm] [2.12 regression][AOTAutograd] "Cannot access data pointer of Tensor (FakeTensor/FunctionalTensor)" during cache save

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Summary

Environment

Question / diagnosis

Affected tests

Links

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix [vllm] [2.12 regression][AOTAutograd] "Cannot access data pointer of Tensor (FakeTensor/FunctionalTensor)" during cache save

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Summary

Environment

Question / diagnosis

Affected tests

Links

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING