pytorch - 💡(How to fix) Fix `torch.compile(torch.vmap(torch.func.hessian(f)))` crashes with TorchRuntimeError

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

TorchRuntimeError: RuntimeError when making fake tensor call
Explanation: Dynamo failed to run FX node with fake tensors:
  call_function <function _autograd_grad at 0x...>(
    *([GradTrackingTensor(lvl=-2, value=
        GradTrackingTensor(lvl=3, value=
          BatchedTensor(lvl=1, bdim=0, value=
            FakeTensor(..., device='cuda:0', size=(4,), dtype=torch.float64))))],
      ...))

Root Cause

The root cause appears to be that jacfwd as the outer transform, combined with vmap and compile, fails during fake tensor propagation. Specifically, hessian(f) = jacfwd(jacrev(f)), and the jacfwd outer layer triggers the crash.

Fix Action

Fix / Workaround

Key observations:

  • vmap(jacfwd(jacrev(f))) (= vmap(hessian(f))) — CRASHES
  • vmap(jacfwd(jacfwd(f)))CRASHES
  • vmap(jacrev(jacrev(f)))WORKS (workaround)
  • vmap(jacrev(jacfwd(f)))WORKS
  • jacfwd(jacrev(f)) without vmap — WORKS
  • vmap(jacfwd(f)) without nesting — WORKS

WORKAROUND (works correctly)

compiled_workaround = torch.compile( torch.vmap(torch.func.jacrev(torch.func.jacrev(f))), fullgraph=True ) result = compiled_workaround(x)

Code Example

TorchRuntimeError: RuntimeError when making fake tensor call
Explanation: Dynamo failed to run FX node with fake tensors:
  call_function <function _autograd_grad at 0x...>(
    *([GradTrackingTensor(lvl=-2, value=
        GradTrackingTensor(lvl=3, value=
          BatchedTensor(lvl=1, bdim=0, value=
            FakeTensor(..., device='cuda:0', size=(4,), dtype=torch.float64))))],
      ...))

---

import torch

torch.manual_seed(42)
x = torch.randn(4, 8, device='cuda', dtype=torch.float64)

def f(x):
    return (x ** 4).sum()

# CRASHES
compiled = torch.compile(torch.vmap(torch.func.hessian(f)), fullgraph=True)
result = compiled(x)

# WORKAROUND (works correctly)
compiled_workaround = torch.compile(
    torch.vmap(torch.func.jacrev(torch.func.jacrev(f))),
    fullgraph=True
)
result = compiled_workaround(x)

---

PyTorch: 2.13.0.dev20260501+cu126
Python: 3.11
CUDA: 12.6
GPU: Tesla T4
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

🐛 Describe the bug

torch.compile crashes when compiling a vmapped Hessian computation (torch.vmap(torch.func.hessian(f))). The crash occurs at the Dynamo level (not Inductor-specific — aot_eager backend also crashes).

The root cause appears to be that jacfwd as the outer transform, combined with vmap and compile, fails during fake tensor propagation. Specifically, hessian(f) = jacfwd(jacrev(f)), and the jacfwd outer layer triggers the crash.

Key observations:

  • vmap(jacfwd(jacrev(f))) (= vmap(hessian(f))) — CRASHES
  • vmap(jacfwd(jacfwd(f)))CRASHES
  • vmap(jacrev(jacrev(f)))WORKS (workaround)
  • vmap(jacrev(jacfwd(f)))WORKS
  • jacfwd(jacrev(f)) without vmap — WORKS
  • vmap(jacfwd(f)) without nesting — WORKS

So the crash requires: jacfwd as outer + nested differentiation + vmap + compile.

Error message

TorchRuntimeError: RuntimeError when making fake tensor call
Explanation: Dynamo failed to run FX node with fake tensors:
  call_function <function _autograd_grad at 0x...>(
    *([GradTrackingTensor(lvl=-2, value=
        GradTrackingTensor(lvl=3, value=
          BatchedTensor(lvl=1, bdim=0, value=
            FakeTensor(..., device='cuda:0', size=(4,), dtype=torch.float64))))],
      ...))

To reproduce

import torch

torch.manual_seed(42)
x = torch.randn(4, 8, device='cuda', dtype=torch.float64)

def f(x):
    return (x ** 4).sum()

# CRASHES
compiled = torch.compile(torch.vmap(torch.func.hessian(f)), fullgraph=True)
result = compiled(x)

# WORKAROUND (works correctly)
compiled_workaround = torch.compile(
    torch.vmap(torch.func.jacrev(torch.func.jacrev(f))),
    fullgraph=True
)
result = compiled_workaround(x)

The crash also occurs with f(x) = sin(x).sum(), f(x) = exp(x).sum(), f(x) = (x**2).sum(), but NOT with f(x) = tanh(x).sum().

Versions

Versions

PyTorch: 2.13.0.dev20260501+cu126
Python: 3.11
CUDA: 12.6
GPU: Tesla T4

cc @chauhang @penguinwu @Chillee @samdow @kshitij12345

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix `torch.compile(torch.vmap(torch.func.hessian(f)))` crashes with TorchRuntimeError