pytorch - 💡(How to fix) Fix `torch.compile` fails on `Tensor.to_dense()` for strided CUDA tensor with `AssertionError: expected non-functional tensor`

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: AssertionError: expected non-functional tensor

Code Example

torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
AssertionError: expected non-functional tensor

---

def f(x):
    return x.to_dense()

---

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import os
import platform
import traceback

import torch


def print_env():
    print("Python:", platform.python_version())
    print("Platform:", platform.platform())
    print("PyTorch:", torch.__version__)
    print("CUDA available:", torch.cuda.is_available())
    print("CUDA device count:", torch.cuda.device_count())
    print("CUDA_VISIBLE_DEVICES:", os.environ.get("CUDA_VISIBLE_DEVICES", ""))
    if torch.cuda.is_available():
        print("Current CUDA device:", torch.cuda.current_device())
        print("CUDA device name:", torch.cuda.get_device_name(0))


def f(x):
    return x.to_dense()


def main():
    print_env()

    if not torch.cuda.is_available():
        raise RuntimeError("This repro expects CUDA.")

    x = torch.rand(3, 4, device="cuda")

    print("\nRunning eager...")
    eager_out = f(x)
    print("Eager succeeded.")
    print("eager is x:", eager_out is x)
    print("eager output:", eager_out)

    print("\nRunning compiled...")
    compiled_f = torch.compile(
        f,
        backend="inductor",
        fullgraph=True,
        dynamic=True,
    )

    try:
        compiled_out = compiled_f(x)
        print("Compiled succeeded.")
        print("compiled output:", compiled_out)
        torch.testing.assert_close(eager_out, compiled_out)
        print("Compiled output matches eager.")
    except Exception:
        print("Compiled failed:")
        traceback.print_exc()
        raise


if __name__ == "__main__":
    main()

---

Running eager...
Eager succeeded.
eager is x: True

---

torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
AssertionError: expected non-functional tensor

---

File "torch/_functorch/_aot_autograd/collect_metadata_analysis.py", line 845, in inner
    fw_graph_outs = pytree.tree_map(from_fun, f_fw_graph_outs)

File "torch/_functorch/_aot_autograd/functional_utils.py", line 82, in from_fun
    raise AssertionError("expected non-functional tensor")

---

PyTorch version:  2.13.0a0+git059c270
Is debug build: True
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.8.0-59-generic-x86_64-with-glibc2.35
Is CUDA available: True
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.compile fails when compiling a function that calls Tensor.to_dense() on a regular strided CUDA tensor.

In eager mode, x.to_dense() succeeds and returns x itself. However, the compiled version fails inside AOTAutograd / functionalization metadata collection with:

torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
AssertionError: expected non-functional tensor

The minimal function is:

def f(x):
    return x.to_dense()

Minimal repro

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import os
import platform
import traceback

import torch


def print_env():
    print("Python:", platform.python_version())
    print("Platform:", platform.platform())
    print("PyTorch:", torch.__version__)
    print("CUDA available:", torch.cuda.is_available())
    print("CUDA device count:", torch.cuda.device_count())
    print("CUDA_VISIBLE_DEVICES:", os.environ.get("CUDA_VISIBLE_DEVICES", ""))
    if torch.cuda.is_available():
        print("Current CUDA device:", torch.cuda.current_device())
        print("CUDA device name:", torch.cuda.get_device_name(0))


def f(x):
    return x.to_dense()


def main():
    print_env()

    if not torch.cuda.is_available():
        raise RuntimeError("This repro expects CUDA.")

    x = torch.rand(3, 4, device="cuda")

    print("\nRunning eager...")
    eager_out = f(x)
    print("Eager succeeded.")
    print("eager is x:", eager_out is x)
    print("eager output:", eager_out)

    print("\nRunning compiled...")
    compiled_f = torch.compile(
        f,
        backend="inductor",
        fullgraph=True,
        dynamic=True,
    )

    try:
        compiled_out = compiled_f(x)
        print("Compiled succeeded.")
        print("compiled output:", compiled_out)
        torch.testing.assert_close(eager_out, compiled_out)
        print("Compiled output matches eager.")
    except Exception:
        print("Compiled failed:")
        traceback.print_exc()
        raise


if __name__ == "__main__":
    main()

Actual behavior

Eager succeeds:

Running eager...
Eager succeeded.
eager is x: True

Compiled execution fails:

torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
AssertionError: expected non-functional tensor

Relevant stack trace:

File "torch/_functorch/_aot_autograd/collect_metadata_analysis.py", line 845, in inner
    fw_graph_outs = pytree.tree_map(from_fun, f_fw_graph_outs)

File "torch/_functorch/_aot_autograd/functional_utils.py", line 82, in from_fun
    raise AssertionError("expected non-functional tensor")

Versions

PyTorch version:  2.13.0a0+git059c270
Is debug build: True
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.8.0-59-generic-x86_64-with-glibc2.35
Is CUDA available: True

cc @nikitaved @pearu @cpuhrsch @amjames @bhosmer @jcaip @bdhirsh @ezyang @chauhang @penguinwu @bobrenjc93 @aorenste

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix `torch.compile` fails on `Tensor.to_dense()` for strided CUDA tensor with `AssertionError: expected non-functional tensor`