pytorch - 💡(How to fix) Fix torch.compile Inductor crashes on onnx2torch-converted ASPP-style Conv-Add-Reshape model with FakeTensor shape input [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#182651Fetched 2026-05-07 03:30:54
View on GitHub
Comments
0
Participants
1
Timeline
37
Reactions
0
Participants
Timeline (top)
mentioned ×16subscribed ×16labeled ×5

Error Message

#!/usr/bin/env python3 import sys import numpy as np import onnx import onnxruntime as ort import onnx2torch import torch from onnx import helper as oh, TensorProto as TP, numpy_helper as onh

np.random.seed(42)

B, C_in, H, W = 1, 8, 16, 16 C_out = 4 x = np.random.randn(B, C_in, H, W).astype(np.float32)

nodes = [] branch_outs = [] inits = []

for i, d in enumerate([1, 6, 12, 18]): w = (np.random.randn(C_out, C_in, 3, 3) * 0.05).astype(np.float32) inits.append(onh.from_array(w, f"W{i}")) nodes.append( oh.make_node( "Conv", ["X", f"W{i}"], [f"b{i}"], kernel_shape=[3, 3], strides=[1, 1], pads=[d, d, d, d], dilations=[d, d], ) ) branch_outs.append(f"b{i}")

nodes += [ oh.make_node("Add", [branch_outs[0], branch_outs[1]], ["s01"]), oh.make_node("Add", [branch_outs[2], branch_outs[3]], ["s23"]), oh.make_node("Add", ["s01", "s23"], ["fused"]), ]

s = np.array([B, C_out, H * W], dtype=np.int64) inits.append(onh.from_array(s, "s")) nodes.append(oh.make_node("Reshape", ["fused", "s"], ["Y"]))

graph = oh.make_graph( nodes, "aspp_dilated_branch", [oh.make_tensor_value_info("X", TP.FLOAT, [B, C_in, H, W])], [oh.make_tensor_value_info("Y", TP.FLOAT, [B, C_out, H * W])], initializer=inits, )

model = oh.make_model(graph, opset_imports=[oh.make_opsetid("", 13)]) model.ir_version = 8 mb = model.SerializeToString()

so = ort.SessionOptions() so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL

ort_out = ort.InferenceSession( mb, sess_options=so, providers=["CPUExecutionProvider"], ).run(None, {"X": x})[0]

print(f"ORT: shape={ort_out.shape} first4={ort_out.ravel()[:4]}")

net = onnx2torch.convert(onnx.load_from_string(mb)).eval() x_t = torch.from_numpy(x)

with torch.no_grad(): eager_out = net(x_t).numpy()

print(f"eager onnx2torch: shape={eager_out.shape} first4={eager_out.ravel()[:4]}")

try: with torch.no_grad(): got = torch.compile(net, mode="default")(x_t).numpy() diff = float(np.abs(got - eager_out).max()) print(f"torch.compile: shape={got.shape} max_diff_vs_eager={diff:.2e}") except Exception as e: print("BUG REPRODUCED: torch.compile crashes while eager succeeds") print(f" {type(e).name}: {str(e)[:160]}") sys.exit(0)

print("NOT REPRODUCED") sys.exit(1)

Fix Action

Fix / Workaround

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: GenuineIntel Model name: 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz CPU family: 6 Model: 141 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 1 Stepping: 1 BogoMIPS: 4607.99 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512vbmi umip avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2intersect md_clear flush_l1d arch_capabilities Virtualization: VT-x Hypervisor vendor: Microsoft Virtualization type: full L1d cache: 384 KiB (8 instances) L1i cache: 256 KiB (8 instances) L2 cache: 10 MiB (8 instances) L3 cache: 24 MiB (1 instance) Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Mitigation; Enhanced IBRS Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected

Code Example

#!/usr/bin/env python3
import sys
import numpy as np
import onnx
import onnxruntime as ort
import onnx2torch
import torch
from onnx import helper as oh, TensorProto as TP, numpy_helper as onh

np.random.seed(42)

B, C_in, H, W = 1, 8, 16, 16
C_out = 4
x = np.random.randn(B, C_in, H, W).astype(np.float32)

nodes = []
branch_outs = []
inits = []

for i, d in enumerate([1, 6, 12, 18]):
    w = (np.random.randn(C_out, C_in, 3, 3) * 0.05).astype(np.float32)
    inits.append(onh.from_array(w, f"W{i}"))
    nodes.append(
        oh.make_node(
            "Conv",
            ["X", f"W{i}"],
            [f"b{i}"],
            kernel_shape=[3, 3],
            strides=[1, 1],
            pads=[d, d, d, d],
            dilations=[d, d],
        )
    )
    branch_outs.append(f"b{i}")

nodes += [
    oh.make_node("Add", [branch_outs[0], branch_outs[1]], ["s01"]),
    oh.make_node("Add", [branch_outs[2], branch_outs[3]], ["s23"]),
    oh.make_node("Add", ["s01", "s23"], ["fused"]),
]

s = np.array([B, C_out, H * W], dtype=np.int64)
inits.append(onh.from_array(s, "s"))
nodes.append(oh.make_node("Reshape", ["fused", "s"], ["Y"]))

graph = oh.make_graph(
    nodes,
    "aspp_dilated_branch",
    [oh.make_tensor_value_info("X", TP.FLOAT, [B, C_in, H, W])],
    [oh.make_tensor_value_info("Y", TP.FLOAT, [B, C_out, H * W])],
    initializer=inits,
)

model = oh.make_model(graph, opset_imports=[oh.make_opsetid("", 13)])
model.ir_version = 8
mb = model.SerializeToString()

so = ort.SessionOptions()
so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL

ort_out = ort.InferenceSession(
    mb,
    sess_options=so,
    providers=["CPUExecutionProvider"],
).run(None, {"X": x})[0]

print(f"ORT: shape={ort_out.shape}  first4={ort_out.ravel()[:4]}")

net = onnx2torch.convert(onnx.load_from_string(mb)).eval()
x_t = torch.from_numpy(x)

with torch.no_grad():
    eager_out = net(x_t).numpy()

print(f"eager onnx2torch: shape={eager_out.shape}  first4={eager_out.ravel()[:4]}")

try:
    with torch.no_grad():
        got = torch.compile(net, mode="default")(x_t).numpy()
    diff = float(np.abs(got - eager_out).max())
    print(f"torch.compile: shape={got.shape}  max_diff_vs_eager={diff:.2e}")
except Exception as e:
    print("BUG REPRODUCED: torch.compile crashes while eager succeeds")
    print(f"  {type(e).__name__}: {str(e)[:160]}")
    sys.exit(0)

print("NOT REPRODUCED")
sys.exit(1)

---

ORT: shape=(1, 4, 256)  first4=[0.18806624 0.22530769 0.06371912 0.435764  ]
eager onnx2torch: shape=(1, 4, 256)  first4=[0.18806624 0.22530769 0.06371912 0.435764  ]

BUG REPRODUCED: torch.compile crashes while eager succeeds
  InternalTorchDynamoError: TypeError: torch.Size() takes an iterable of 'int' (item 0 is 'FakeTensor')

---

PyTorch version: 2.11.0+cu130
Is debug build: False
CUDA used to build PyTorch: 13.0
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.2 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.39

Python version: 3.10.20 (main, Mar 11 2026, 17:46:40) [GCC 14.3.0] (64-bit runtime)
Python platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39
Is CUDA available: False
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080 Laptop GPU
Nvidia driver version: 545.92
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        39 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               16
On-line CPU(s) list:                  0-15
Vendor ID:                            GenuineIntel
Model name:                           11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
CPU family:                           6
Model:                                141
Thread(s) per core:                   2
Core(s) per socket:                   8
Socket(s):                            1
Stepping:                             1
BogoMIPS:                             4607.99
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512vbmi umip avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2intersect md_clear flush_l1d arch_capabilities
Virtualization:                       VT-x
Hypervisor vendor:                    Microsoft
Virtualization type:                  full
L1d cache:                            384 KiB (8 instances)
L1i cache:                            256 KiB (8 instances)
L2 cache:                             10 MiB (8 instances)
L3 cache:                             24 MiB (1 instance)
Vulnerability Gather data sampling:   Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] numpy==2.2.6
[pip3] nvidia-cublas==13.1.0.3
[pip3] nvidia-cuda-cupti==13.0.85
[pip3] nvidia-cuda-nvrtc==13.0.88
[pip3] nvidia-cuda-runtime==13.0.96
[pip3] nvidia-cudnn-cu13==9.19.0.56
[pip3] nvidia-cufft==12.0.0.61
[pip3] nvidia-curand==10.4.0.35
[pip3] nvidia-cusolver==12.0.4.66
[pip3] nvidia-cusparse==12.6.3.3
[pip3] nvidia-cusparselt-cu13==0.8.0
[pip3] nvidia-nccl-cu13==2.28.9
[pip3] nvidia-nvjitlink==13.0.88
[pip3] nvidia-nvtx==13.0.85
[pip3] onnx==1.21.0
[pip3] onnx2torch==1.5.15
[pip3] onnxruntime==1.23.2
[pip3] optree==0.19.1
[pip3] torch==2.11.0
[pip3] torchvision==0.26.0
[pip3] triton==3.6.0
[conda] numpy                      2.2.6            pypi_0           pypi
[conda] nvidia-cublas              13.1.0.3         pypi_0           pypi
[conda] nvidia-cuda-cupti          13.0.85          pypi_0           pypi
[conda] nvidia-cuda-nvrtc          13.0.88          pypi_0           pypi
[conda] nvidia-cuda-runtime        13.0.96          pypi_0           pypi
[conda] nvidia-cudnn-cu13          9.19.0.56        pypi_0           pypi
[conda] nvidia-cufft               12.0.0.61        pypi_0           pypi
[conda] nvidia-curand              10.4.0.35        pypi_0           pypi
[conda] nvidia-cusolver            12.0.4.66        pypi_0           pypi
[conda] nvidia-cusparse            12.6.3.3         pypi_0           pypi
[conda] nvidia-cusparselt-cu13     0.8.0            pypi_0           pypi
[conda] nvidia-nccl-cu13           2.28.9           pypi_0           pypi
[conda] nvidia-nvjitlink           13.0.88          pypi_0           pypi
[conda] nvidia-nvtx                13.0.85          pypi_0           pypi
[conda] onnx2torch                 1.5.15           pypi_0           pypi
[conda] optree                     0.19.1           pypi_0           pypi
[conda] torch                      2.11.0           pypi_0           pypi
[conda] torchvision                0.26.0           pypi_0           pypi
[conda] triton                     3.6.0            pypi_0           pypi
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.compile with the Inductor backend crashes on an ONNX model converted by onnx2torch.

ONNX Runtime and eager onnx2torch both run correctly, but torch.compile crashes during tracing with a FakeTensor shape-related error.

#!/usr/bin/env python3
import sys
import numpy as np
import onnx
import onnxruntime as ort
import onnx2torch
import torch
from onnx import helper as oh, TensorProto as TP, numpy_helper as onh

np.random.seed(42)

B, C_in, H, W = 1, 8, 16, 16
C_out = 4
x = np.random.randn(B, C_in, H, W).astype(np.float32)

nodes = []
branch_outs = []
inits = []

for i, d in enumerate([1, 6, 12, 18]):
    w = (np.random.randn(C_out, C_in, 3, 3) * 0.05).astype(np.float32)
    inits.append(onh.from_array(w, f"W{i}"))
    nodes.append(
        oh.make_node(
            "Conv",
            ["X", f"W{i}"],
            [f"b{i}"],
            kernel_shape=[3, 3],
            strides=[1, 1],
            pads=[d, d, d, d],
            dilations=[d, d],
        )
    )
    branch_outs.append(f"b{i}")

nodes += [
    oh.make_node("Add", [branch_outs[0], branch_outs[1]], ["s01"]),
    oh.make_node("Add", [branch_outs[2], branch_outs[3]], ["s23"]),
    oh.make_node("Add", ["s01", "s23"], ["fused"]),
]

s = np.array([B, C_out, H * W], dtype=np.int64)
inits.append(onh.from_array(s, "s"))
nodes.append(oh.make_node("Reshape", ["fused", "s"], ["Y"]))

graph = oh.make_graph(
    nodes,
    "aspp_dilated_branch",
    [oh.make_tensor_value_info("X", TP.FLOAT, [B, C_in, H, W])],
    [oh.make_tensor_value_info("Y", TP.FLOAT, [B, C_out, H * W])],
    initializer=inits,
)

model = oh.make_model(graph, opset_imports=[oh.make_opsetid("", 13)])
model.ir_version = 8
mb = model.SerializeToString()

so = ort.SessionOptions()
so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL

ort_out = ort.InferenceSession(
    mb,
    sess_options=so,
    providers=["CPUExecutionProvider"],
).run(None, {"X": x})[0]

print(f"ORT: shape={ort_out.shape}  first4={ort_out.ravel()[:4]}")

net = onnx2torch.convert(onnx.load_from_string(mb)).eval()
x_t = torch.from_numpy(x)

with torch.no_grad():
    eager_out = net(x_t).numpy()

print(f"eager onnx2torch: shape={eager_out.shape}  first4={eager_out.ravel()[:4]}")

try:
    with torch.no_grad():
        got = torch.compile(net, mode="default")(x_t).numpy()
    diff = float(np.abs(got - eager_out).max())
    print(f"torch.compile: shape={got.shape}  max_diff_vs_eager={diff:.2e}")
except Exception as e:
    print("BUG REPRODUCED: torch.compile crashes while eager succeeds")
    print(f"  {type(e).__name__}: {str(e)[:160]}")
    sys.exit(0)

print("NOT REPRODUCED")
sys.exit(1)

Error logs

ORT: shape=(1, 4, 256)  first4=[0.18806624 0.22530769 0.06371912 0.435764  ]
eager onnx2torch: shape=(1, 4, 256)  first4=[0.18806624 0.22530769 0.06371912 0.435764  ]

BUG REPRODUCED: torch.compile crashes while eager succeeds
  InternalTorchDynamoError: TypeError: torch.Size() takes an iterable of 'int' (item 0 is 'FakeTensor')

Versions

PyTorch version: 2.11.0+cu130
Is debug build: False
CUDA used to build PyTorch: 13.0
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.2 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.39

Python version: 3.10.20 (main, Mar 11 2026, 17:46:40) [GCC 14.3.0] (64-bit runtime)
Python platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39
Is CUDA available: False
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080 Laptop GPU
Nvidia driver version: 545.92
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        39 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               16
On-line CPU(s) list:                  0-15
Vendor ID:                            GenuineIntel
Model name:                           11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
CPU family:                           6
Model:                                141
Thread(s) per core:                   2
Core(s) per socket:                   8
Socket(s):                            1
Stepping:                             1
BogoMIPS:                             4607.99
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512vbmi umip avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2intersect md_clear flush_l1d arch_capabilities
Virtualization:                       VT-x
Hypervisor vendor:                    Microsoft
Virtualization type:                  full
L1d cache:                            384 KiB (8 instances)
L1i cache:                            256 KiB (8 instances)
L2 cache:                             10 MiB (8 instances)
L3 cache:                             24 MiB (1 instance)
Vulnerability Gather data sampling:   Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] numpy==2.2.6
[pip3] nvidia-cublas==13.1.0.3
[pip3] nvidia-cuda-cupti==13.0.85
[pip3] nvidia-cuda-nvrtc==13.0.88
[pip3] nvidia-cuda-runtime==13.0.96
[pip3] nvidia-cudnn-cu13==9.19.0.56
[pip3] nvidia-cufft==12.0.0.61
[pip3] nvidia-curand==10.4.0.35
[pip3] nvidia-cusolver==12.0.4.66
[pip3] nvidia-cusparse==12.6.3.3
[pip3] nvidia-cusparselt-cu13==0.8.0
[pip3] nvidia-nccl-cu13==2.28.9
[pip3] nvidia-nvjitlink==13.0.88
[pip3] nvidia-nvtx==13.0.85
[pip3] onnx==1.21.0
[pip3] onnx2torch==1.5.15
[pip3] onnxruntime==1.23.2
[pip3] optree==0.19.1
[pip3] torch==2.11.0
[pip3] torchvision==0.26.0
[pip3] triton==3.6.0
[conda] numpy                      2.2.6            pypi_0           pypi
[conda] nvidia-cublas              13.1.0.3         pypi_0           pypi
[conda] nvidia-cuda-cupti          13.0.85          pypi_0           pypi
[conda] nvidia-cuda-nvrtc          13.0.88          pypi_0           pypi
[conda] nvidia-cuda-runtime        13.0.96          pypi_0           pypi
[conda] nvidia-cudnn-cu13          9.19.0.56        pypi_0           pypi
[conda] nvidia-cufft               12.0.0.61        pypi_0           pypi
[conda] nvidia-curand              10.4.0.35        pypi_0           pypi
[conda] nvidia-cusolver            12.0.4.66        pypi_0           pypi
[conda] nvidia-cusparse            12.6.3.3         pypi_0           pypi
[conda] nvidia-cusparselt-cu13     0.8.0            pypi_0           pypi
[conda] nvidia-nccl-cu13           2.28.9           pypi_0           pypi
[conda] nvidia-nvjitlink           13.0.88          pypi_0           pypi
[conda] nvidia-nvtx                13.0.85          pypi_0           pypi
[conda] onnx2torch                 1.5.15           pypi_0           pypi
[conda] optree                     0.19.1           pypi_0           pypi
[conda] torch                      2.11.0           pypi_0           pypi
[conda] torchvision                0.26.0           pypi_0           pypi
[conda] triton                     3.6.0            pypi_0           pypi

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @amjames @Lucaskabela @jataylo @azahed98

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix torch.compile Inductor crashes on onnx2torch-converted ASPP-style Conv-Add-Reshape model with FakeTensor shape input [1 participants]