pytorch - ✅(Solved) Fix AOTInductor fails with pip-installed ROCm: _find_rocm_home() points to venv root, missing HIP headers [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#180211Fetched 2026-04-15 06:19:24
View on GitHub
Comments
1
Participants
2
Timeline
89
Reactions
0
Timeline (top)
mentioned ×39subscribed ×39labeled ×9added_to_project_v2 ×1

torch._inductor.aoti_compile_and_package() fails on ROCm when ROCm is installed via pip (uv pip install torch). The C++ compilation step can't find hip/hip_runtime.h because _find_rocm_home() resolves ROCM_HOME to the venv root (via the hipcc wrapper in $VENV/bin/), but the HIP headers are inside _rocm_sdk_core/include/hip/ in site-packages.

torch.compile() is unaffected — only AOTInductor's C++ wrapper compilation breaks.

Error Message

ROCM_HOME: /tmp/aoti-repro /tmp/aoti-repro/include/hip/hip_runtime.h exists: False /tmp/aoti-repro/lib/python3.12/site-packages/_rocm_sdk_core/include/hip/hip_runtime.h exists: True

torch._inductor.exc.InductorError: CppCompileError: C++ compile error

Command: g++ .../header.hpp ... -I/tmp/aoti-repro/include ... -E -P -o .../header.i

Output: .../torch/include/torch/csrc/inductor/aoti_runtime/device_utils.h:15:10: fatal error: hip/hip_runtime.h: No such file or directory 15 | #include <hip/hip_runtime.h> | ^~~~~~~~~~~~~~~~~~~ compilation terminated.

Root Cause

_find_rocm_home() discovers ROCM_HOME by finding hipcc on PATH and taking its grandparent directory. With pip-installed ROCm, hipcc is a Python wrapper script at $VENV/bin/hipcc, so ROCM_HOME resolves to the venv root ($VENV/). But the actual HIP headers, libraries, and tools live inside $VENV/lib/python3.12/site-packages/_rocm_sdk_core/.

The result:

  • include_paths('cuda') adds $VENV/include to -I flags
  • $VENV/include/hip/hip_runtime.h doesn't exist
  • The actual header is at $VENV/lib/.../site-packages/_rocm_sdk_core/include/hip/hip_runtime.h

The analogous _find_sycl_home() already handles pip-installed Intel SDKs by using importlib.metadata to locate the package. _find_rocm_home() should do the same for rocm-sdk-core:

# In _find_rocm_home(), after the hipcc fallback:
try:
    from rocm_sdk_core._cli import _get_core_module_path
    rocm_home = str(_get_core_module_path())
except (ImportError, Exception):
    pass

Fix Action

Workaround

Symlink the HIP headers into ROCM_HOME/include/:

ROCM_HOME=$(python3 -c "import torch.utils.cpp_extension as e; print(e.ROCM_HOME)")
CORE=$(python3 -c "from rocm_sdk_core._cli import _get_core_module_path; print(_get_core_module_path())")
mkdir -p "$ROCM_HOME/include"
ln -s "$CORE/include/hip" "$ROCM_HOME/include/hip"

PR fix notes

PR #180723: [ROCm] Support ROCm distribution from TheRock in _find_rocm_home

Description (problem / solution / changelog)

Fixes https://github.com/pytorch/pytorch/issues/180211

Add a new guess to _find_rocm_home() that locates the rocm-sdk-core install directory when PyTorch is used alongside the TheRock pip ROCm distribution. Without this, hipcc resolves to the venv's wrapper script and ROCM_HOME ends up pointing at the venv root, so AOTInductor's C++ compile fails with fatal error: hip/hip_runtime.h: No such file or directory.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @pragupta @jerrymannil @xinyazhang

Changed files

  • torch/utils/cpp_extension.py (modified, +11/-2)

Code Example

uv venv /tmp/aoti-repro --python 3.12 && source /tmp/aoti-repro/bin/activate
uv pip install \
  "torch==2.12.0a0+rocm7.13.0a20260411" \
  --extra-index-url https://rocm.nightlies.amd.com/v2/gfx1151 \
  --index-strategy unsafe-first-match \
  --prerelease=allow

python3 -c "
import torch, torch._inductor, torch.utils.cpp_extension as ext, os

print('ROCM_HOME:', ext.ROCM_HOME)
expected = os.path.join(ext.ROCM_HOME, 'include', 'hip', 'hip_runtime.h')
print(f'{expected} exists:', os.path.isfile(expected))

from rocm_sdk_core._cli import _get_core_module_path
actual = os.path.join(_get_core_module_path(), 'include', 'hip', 'hip_runtime.h')
print(f'{actual} exists:', os.path.isfile(actual))

class M(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = torch.nn.Linear(16, 16)
    def forward(self, x):
        return self.linear(x)

m = M().cuda().eval()
x = torch.randn(1, 16, device='cuda')
with torch.no_grad():
    exported = torch.export.export(m, (x,))
    torch._inductor.aoti_compile_and_package(exported, package_path='/tmp/test_aot.pt2')
"

---

ROCM_HOME: /tmp/aoti-repro
/tmp/aoti-repro/include/hip/hip_runtime.h exists: False
/tmp/aoti-repro/lib/python3.12/site-packages/_rocm_sdk_core/include/hip/hip_runtime.h exists: True

torch._inductor.exc.InductorError: CppCompileError: C++ compile error

Command:
g++ .../header.hpp ... -I/tmp/aoti-repro/include ... -E -P -o .../header.i

Output:
.../torch/include/torch/csrc/inductor/aoti_runtime/device_utils.h:15:10:
  fatal error: hip/hip_runtime.h: No such file or directory
   15 | #include <hip/hip_runtime.h>
      |          ^~~~~~~~~~~~~~~~~~~
compilation terminated.

---

# In _find_rocm_home(), after the hipcc fallback:
try:
    from rocm_sdk_core._cli import _get_core_module_path
    rocm_home = str(_get_core_module_path())
except (ImportError, Exception):
    pass

---

ROCM_HOME=$(python3 -c "import torch.utils.cpp_extension as e; print(e.ROCM_HOME)")
CORE=$(python3 -c "from rocm_sdk_core._cli import _get_core_module_path; print(_get_core_module_path())")
mkdir -p "$ROCM_HOME/include"
ln -s "$CORE/include/hip" "$ROCM_HOME/include/hip"
RAW_BUFFERClick to expand / collapse

Summary

torch._inductor.aoti_compile_and_package() fails on ROCm when ROCm is installed via pip (uv pip install torch). The C++ compilation step can't find hip/hip_runtime.h because _find_rocm_home() resolves ROCM_HOME to the venv root (via the hipcc wrapper in $VENV/bin/), but the HIP headers are inside _rocm_sdk_core/include/hip/ in site-packages.

torch.compile() is unaffected — only AOTInductor's C++ wrapper compilation breaks.

Reproducer

uv venv /tmp/aoti-repro --python 3.12 && source /tmp/aoti-repro/bin/activate
uv pip install \
  "torch==2.12.0a0+rocm7.13.0a20260411" \
  --extra-index-url https://rocm.nightlies.amd.com/v2/gfx1151 \
  --index-strategy unsafe-first-match \
  --prerelease=allow

python3 -c "
import torch, torch._inductor, torch.utils.cpp_extension as ext, os

print('ROCM_HOME:', ext.ROCM_HOME)
expected = os.path.join(ext.ROCM_HOME, 'include', 'hip', 'hip_runtime.h')
print(f'{expected} exists:', os.path.isfile(expected))

from rocm_sdk_core._cli import _get_core_module_path
actual = os.path.join(_get_core_module_path(), 'include', 'hip', 'hip_runtime.h')
print(f'{actual} exists:', os.path.isfile(actual))

class M(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = torch.nn.Linear(16, 16)
    def forward(self, x):
        return self.linear(x)

m = M().cuda().eval()
x = torch.randn(1, 16, device='cuda')
with torch.no_grad():
    exported = torch.export.export(m, (x,))
    torch._inductor.aoti_compile_and_package(exported, package_path='/tmp/test_aot.pt2')
"

Output:

ROCM_HOME: /tmp/aoti-repro
/tmp/aoti-repro/include/hip/hip_runtime.h exists: False
/tmp/aoti-repro/lib/python3.12/site-packages/_rocm_sdk_core/include/hip/hip_runtime.h exists: True

torch._inductor.exc.InductorError: CppCompileError: C++ compile error

Command:
g++ .../header.hpp ... -I/tmp/aoti-repro/include ... -E -P -o .../header.i

Output:
.../torch/include/torch/csrc/inductor/aoti_runtime/device_utils.h:15:10:
  fatal error: hip/hip_runtime.h: No such file or directory
   15 | #include <hip/hip_runtime.h>
      |          ^~~~~~~~~~~~~~~~~~~
compilation terminated.

Analysis

_find_rocm_home() discovers ROCM_HOME by finding hipcc on PATH and taking its grandparent directory. With pip-installed ROCm, hipcc is a Python wrapper script at $VENV/bin/hipcc, so ROCM_HOME resolves to the venv root ($VENV/). But the actual HIP headers, libraries, and tools live inside $VENV/lib/python3.12/site-packages/_rocm_sdk_core/.

The result:

  • include_paths('cuda') adds $VENV/include to -I flags
  • $VENV/include/hip/hip_runtime.h doesn't exist
  • The actual header is at $VENV/lib/.../site-packages/_rocm_sdk_core/include/hip/hip_runtime.h

The analogous _find_sycl_home() already handles pip-installed Intel SDKs by using importlib.metadata to locate the package. _find_rocm_home() should do the same for rocm-sdk-core:

# In _find_rocm_home(), after the hipcc fallback:
try:
    from rocm_sdk_core._cli import _get_core_module_path
    rocm_home = str(_get_core_module_path())
except (ImportError, Exception):
    pass

Workaround

Symlink the HIP headers into ROCM_HOME/include/:

ROCM_HOME=$(python3 -c "import torch.utils.cpp_extension as e; print(e.ROCM_HOME)")
CORE=$(python3 -c "from rocm_sdk_core._cli import _get_core_module_path; print(_get_core_module_path())")
mkdir -p "$ROCM_HOME/include"
ln -s "$CORE/include/hip" "$ROCM_HOME/include/hip"

Environment

  • Ubuntu 24.04
  • AMD Ryzen AI MAX+ 395 (gfx1151, Strix Halo)
  • torch==2.12.0a0+rocm7.13.0a20260411
  • rocm-sdk-core==7.13.0a20260411
  • ROCm installed via pip (no system /opt/rocm)

cc @janeyx99 @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang @chauhang @penguinwu @avikchaudhuri @zhxchen17 @tugsbayasgalan @angelayi @ydwu4 @desertfire @yushangdi @benjaminglass1 @iupaikov-amd

extent analysis

TL;DR

The most likely fix is to modify the _find_rocm_home() function to correctly locate the ROCm home directory when ROCm is installed via pip.

Guidance

  • Modify the _find_rocm_home() function to use importlib.metadata to locate the rocm-sdk-core package, similar to how _find_sycl_home() handles pip-installed Intel SDKs.
  • As a workaround, create a symlink to the HIP headers in the ROCM_HOME/include/ directory using the provided bash commands.
  • Verify that the ROCM_HOME environment variable is set correctly and that the HIP headers are accessible at the expected location.
  • Test the torch._inductor.aoti_compile_and_package() function again after applying the fix or workaround to ensure that it compiles successfully.

Example

try:
    from rocm_sdk_core._cli import _get_core_module_path
    rocm_home = str(_get_core_module_path())
except (ImportError, Exception):
    pass

Notes

  • The provided workaround assumes that the rocm-sdk-core package is installed and accessible.
  • The modification to _find_rocm_home() may require changes to the PyTorch codebase and may not be applicable to all environments.

Recommendation

Apply the workaround by creating a symlink to the HIP headers in the ROCM_HOME/include/ directory, as this is a simpler and more immediate solution. The modification to _find_rocm_home() may require more significant changes and testing.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING