pytorch - 💡(How to fix) Fix [ROCm] Nightly wheel 2.12.0.dev20260408+rocm7.2 crashes on gfx900 with rocBLAS "TensileLibrary.dat: Illegal seek"

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

rocBLAS error: Cannot read /media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx900
List of available TensileLibrary Files :
".../TensileLibrary_lazy_gfx908.dat"
".../TensileLibrary_lazy_gfx1030.dat"
".../TensileLibrary_lazy_gfx90a.dat"
".../TensileLibrary_lazy_gfx1100.dat"
".../TensileLibrary_lazy_gfx1201.dat"
".../TensileLibrary_lazy_gfx1101.dat"
".../TensileLibrary_lazy_gfx1151.dat"
".../TensileLibrary_lazy_gfx942.dat"
".../TensileLibrary_lazy_gfx950.dat"
".../TensileLibrary_lazy_gfx1200.dat"
".../TensileLibrary_lazy_gfx1150.dat"
".../TensileLibrary_lazy_gfx1102.dat"
Aborted (core dumped)

Root Cause

KI told me i should open an Issue in this Repo and not ROCM. I thought it would work with my Vega 56 because of this pr, but it looks like this isn't correct...

Fix Action

Fix / Workaround

CPU: Architektur: x86_64 CPU Operationsmodus: 32-bit, 64-bit Adressgrößen: 43 bits physical, 48 bits virtual Byte-Reihenfolge: Little Endian CPU(s): 12 Liste der Online-CPU(s): 0-11 Anbieterkennung: AuthenticAMD Modellname: AMD Ryzen 5 2600 Six-Core Processor Prozessorfamilie: 23 Modell: 8 Thread(s) pro Kern: 2 Kern(e) pro Sockel: 6 Sockel: 1 Stepping: 2 Übertaktung: aktiviert Skalierung der CPU(s): 108% Maximale Taktfrequenz der CPU: 3400,0000 Minimale Taktfrequenz der CPU: 1550,0000 BogoMIPS: 6800,15 Markierungen: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sev sev_es ibpb_exit_to_user Virtualisierung: AMD-V L1d Cache: 192 KiB (6 Instanzen) L1i Cache: 384 KiB (6 Instanzen) L2 Cache: 3 MiB (6 Instanzen) L3 Cache: 16 MiB (2 Instanzen) NUMA-Knoten: 1 NUMA-Knoten0 CPU(s): 0-11 Schwachstelle Gather data sampling: Not affected Schwachstelle Indirect target selection: Not affected Schwachstelle Itlb multihit: Not affected Schwachstelle L1tf: Not affected Schwachstelle Mds: Not affected Schwachstelle Meltdown: Not affected Schwachstelle Mmio stale data: Not affected Schwachstelle Reg file data sampling: Not affected Schwachstelle Retbleed: Mitigation; untrained return thunk; SMT vulnerable Schwachstelle Spec rstack overflow: Mitigation; Safe RET Schwachstelle Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Schwachstelle Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Schwachstelle Spectre v2: Mitigation; Retpolines; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected Schwachstelle Srbds: Not affected Schwachstelle Tsa: Not affected Schwachstelle Tsx async abort: Not affected Schwachstelle Vmscape: Mitigation; IBPB before exit to userspace

Code Example

SA_ENABLE_SDMA=0 HSA_OVERRIDE_GFX_VERSION=9.0.0  python main.py --disable-dynamic-vram
Successfully registered DSL: cutedsl
Successfully registered DSL: triton
Found comfy_kitchen backend eager: {'available': True, 'disabled': False, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_mxfp8', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8', 'scaled_mm_mxfp8', 'scaled_mm_nvfp4']}
Found comfy_kitchen backend cuda: {'available': True, 'disabled': True, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8', 'scaled_mm_nvfp4']}
Found comfy_kitchen backend triton: {'available': True, 'disabled': True, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8']}
Checkpoint files will always be loaded safely.
Total VRAM 8176 MB, total RAM 32004 MB
pytorch version: 2.12.0.dev20260408+rocm7.2
AMD arch: gfx900
ROCm version: (7, 2)
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX Vega : native
Using async weight offloading with 2 streams
Enabled pinned memory 30404.0
Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention
/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/requests/__init__.py:113: RequestsDependencyWarning: urllib3 (2.6.3) or chardet (7.3.0)/charset_normalizer (3.4.6) doesn't match a supported version!
  warnings.warn(
Python version: 3.12.3 (main, Mar  3 2026, 12:15:18) [GCC 13.3.0]
ComfyUI version: 0.18.1
comfy-aimdo version: 0.2.12
comfy-kitchen version: 0.2.8
Dynamic vram disabled with argument. If you have any issues with dynamic vram enabled please give us a detailed reports as this argument will be removed soon.
ComfyUI frontend version: 1.42.8
[Prompt Server] web root: /media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/comfyui_frontend_package/static
Asset seeder disabled
ComfyUI-GGUF: Allowing full torch compile
For direct API calls, use token=$2b$12$ehHmxB6L4y6XpmRZeZDxu.AQLTpC9woteTAz7SZWFUatcyMbHI9aa

Import times for custom nodes:
   0.0 seconds: /media/HDD/programs/AI/ComfyUI/custom_nodes/websocket_image_save.py
   0.0 seconds: /media/HDD/programs/AI/ComfyUI/custom_nodes/ComfyUI-Login
   0.0 seconds: /media/HDD/programs/AI/ComfyUI/custom_nodes/ComfyUI-GGUF

Context impl SQLiteImpl.
Will assume non-transactional DDL.
Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
model weight dtype torch.float16, manual cast: None
model_type EPS
Using split attention in VAE
Using split attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.float32
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load SD1ClipModel
loaded completely; 6892.80 MB usable, 235.84 MB loaded, full load: True

rocBLAS error: Cannot read /media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx900
 List of available TensileLibrary Files : 
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1201.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1151.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx950.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1200.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1150.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat"
Abgebrochen (Speicherabzug geschrieben)

---

rocBLAS error: Cannot read /media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx900
List of available TensileLibrary Files :
".../TensileLibrary_lazy_gfx908.dat"
".../TensileLibrary_lazy_gfx1030.dat"
".../TensileLibrary_lazy_gfx90a.dat"
".../TensileLibrary_lazy_gfx1100.dat"
".../TensileLibrary_lazy_gfx1201.dat"
".../TensileLibrary_lazy_gfx1101.dat"
".../TensileLibrary_lazy_gfx1151.dat"
".../TensileLibrary_lazy_gfx942.dat"
".../TensileLibrary_lazy_gfx950.dat"
".../TensileLibrary_lazy_gfx1200.dat"
".../TensileLibrary_lazy_gfx1150.dat"
".../TensileLibrary_lazy_gfx1102.dat"
Aborted (core dumped)

---

pip install --force-reinstall --no-cache-dir --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm7.2

---

curl -sL https://raw.githubusercontent.com/pytorch/pytorch/main/torch/utils/collect_env.py | python
Collecting environment information...
PyTorch version: 2.12.0.dev20260408+rocm7.2
Is debug build: False
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: 7.2.53211

OS: Linux Mint 22.3 (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Clang version: Could not collect
CMake version: version 3.28.3
Libc version: glibc-2.39

Python version: 3.12.3 (main, Mar  3 2026, 12:15:18) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-6.8.0-107-generic-x86_64-with-glibc2.39
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: 
GPU models and configuration: AMD Radeon RX Vega (gfx900:xnack-)
Nvidia driver version: Could not collect
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: 7.2.53211
MIOpen runtime version: 3.5.1
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Architektur:                             x86_64
CPU Operationsmodus:                     32-bit, 64-bit
Adressgrößen:                            43 bits physical, 48 bits virtual
Byte-Reihenfolge:                        Little Endian
CPU(s):                                  12
Liste der Online-CPU(s):                 0-11
Anbieterkennung:                         AuthenticAMD
Modellname:                              AMD Ryzen 5 2600 Six-Core Processor
Prozessorfamilie:                        23
Modell:                                  8
Thread(s) pro Kern:                      2
Kern(e) pro Sockel:                      6
Sockel:                                  1
Stepping:                                2
Übertaktung:                             aktiviert
Skalierung der CPU(s):                   108%
Maximale Taktfrequenz der CPU:           3400,0000
Minimale Taktfrequenz der CPU:           1550,0000
BogoMIPS:                                6800,15
Markierungen:                            fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sev sev_es ibpb_exit_to_user
Virtualisierung:                         AMD-V
L1d Cache:                               192 KiB (6 Instanzen)
L1i Cache:                               384 KiB (6 Instanzen)
L2 Cache:                                3 MiB (6 Instanzen)
L3 Cache:                                16 MiB (2 Instanzen)
NUMA-Knoten:                             1
NUMA-Knoten0 CPU(s):                     0-11
Schwachstelle Gather data sampling:      Not affected
Schwachstelle Indirect target selection: Not affected
Schwachstelle Itlb multihit:             Not affected
Schwachstelle L1tf:                      Not affected
Schwachstelle Mds:                       Not affected
Schwachstelle Meltdown:                  Not affected
Schwachstelle Mmio stale data:           Not affected
Schwachstelle Reg file data sampling:    Not affected
Schwachstelle Retbleed:                  Mitigation; untrained return thunk; SMT vulnerable
Schwachstelle Spec rstack overflow:      Mitigation; Safe RET
Schwachstelle Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Schwachstelle Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Schwachstelle Spectre v2:                Mitigation; Retpolines; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Schwachstelle Srbds:                     Not affected
Schwachstelle Tsa:                       Not affected
Schwachstelle Tsx async abort:           Not affected
Schwachstelle Vmscape:                   Mitigation; IBPB before exit to userspace

Versions of relevant libraries:
[pip3] numpy==2.4.4
[pip3] nvidia-cublas==13.1.0.3
[pip3] nvidia-cuda-cupti==13.0.85
[pip3] nvidia-cuda-nvrtc==13.0.88
[pip3] nvidia-cuda-runtime==13.0.96
[pip3] nvidia-cudnn-cu13==9.19.0.56
[pip3] nvidia-cufft==12.0.0.61
[pip3] nvidia-curand==10.4.0.35
[pip3] nvidia-cusolver==12.0.4.66
[pip3] nvidia-cusparse==12.6.3.3
[pip3] nvidia-cusparselt-cu13==0.8.0
[pip3] nvidia-nccl-cu13==2.28.9
[pip3] nvidia-nvjitlink==13.0.88
[pip3] nvidia-nvtx==13.0.85
[pip3] torch==2.12.0.dev20260408+rocm7.2
[pip3] torchaudio==2.11.0.dev20260409+rocm7.2
[pip3] torchsde==0.2.6
[pip3] torchvision==0.27.0.dev20260409+rocm7.2
[pip3] triton-rocm==3.7.0+git282c8251
[conda] Could not collect
(venv-linux) jokergermany@huppyryzen:/media/HDD/programs/AI/ComfyUI$
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

KI told me i should open an Issue in this Repo and not ROCM. I thought it would work with my Vega 56 because of this pr, but it looks like this isn't correct...

SA_ENABLE_SDMA=0 HSA_OVERRIDE_GFX_VERSION=9.0.0  python main.py --disable-dynamic-vram
Successfully registered DSL: cutedsl
Successfully registered DSL: triton
Found comfy_kitchen backend eager: {'available': True, 'disabled': False, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_mxfp8', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8', 'scaled_mm_mxfp8', 'scaled_mm_nvfp4']}
Found comfy_kitchen backend cuda: {'available': True, 'disabled': True, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8', 'scaled_mm_nvfp4']}
Found comfy_kitchen backend triton: {'available': True, 'disabled': True, 'unavailable_reason': None, 'capabilities': ['apply_rope', 'apply_rope1', 'dequantize_nvfp4', 'dequantize_per_tensor_fp8', 'quantize_mxfp8', 'quantize_nvfp4', 'quantize_per_tensor_fp8']}
Checkpoint files will always be loaded safely.
Total VRAM 8176 MB, total RAM 32004 MB
pytorch version: 2.12.0.dev20260408+rocm7.2
AMD arch: gfx900
ROCm version: (7, 2)
Set vram state to: NORMAL_VRAM
Device: cuda:0 AMD Radeon RX Vega : native
Using async weight offloading with 2 streams
Enabled pinned memory 30404.0
Using sub quadratic optimization for attention, if you have memory or speed issues try using: --use-split-cross-attention
/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/requests/__init__.py:113: RequestsDependencyWarning: urllib3 (2.6.3) or chardet (7.3.0)/charset_normalizer (3.4.6) doesn't match a supported version!
  warnings.warn(
Python version: 3.12.3 (main, Mar  3 2026, 12:15:18) [GCC 13.3.0]
ComfyUI version: 0.18.1
comfy-aimdo version: 0.2.12
comfy-kitchen version: 0.2.8
Dynamic vram disabled with argument. If you have any issues with dynamic vram enabled please give us a detailed reports as this argument will be removed soon.
ComfyUI frontend version: 1.42.8
[Prompt Server] web root: /media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/comfyui_frontend_package/static
Asset seeder disabled
ComfyUI-GGUF: Allowing full torch compile
For direct API calls, use token=$2b$12$ehHmxB6L4y6XpmRZeZDxu.AQLTpC9woteTAz7SZWFUatcyMbHI9aa

Import times for custom nodes:
   0.0 seconds: /media/HDD/programs/AI/ComfyUI/custom_nodes/websocket_image_save.py
   0.0 seconds: /media/HDD/programs/AI/ComfyUI/custom_nodes/ComfyUI-Login
   0.0 seconds: /media/HDD/programs/AI/ComfyUI/custom_nodes/ComfyUI-GGUF

Context impl SQLiteImpl.
Will assume non-transactional DDL.
Starting server

To see the GUI go to: http://127.0.0.1:8188
got prompt
model weight dtype torch.float16, manual cast: None
model_type EPS
Using split attention in VAE
Using split attention in VAE
VAE load device: cuda:0, offload device: cpu, dtype: torch.float32
CLIP/text encoder model load device: cuda:0, offload device: cpu, current: cpu, dtype: torch.float16
Requested to load SD1ClipModel
loaded completely; 6892.80 MB usable, 235.84 MB loaded, full load: True

rocBLAS error: Cannot read /media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx900
 List of available TensileLibrary Files : 
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1201.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1151.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx942.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx950.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1200.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1150.dat"
"/media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat"
Abgebrochen (Speicherabzug geschrieben)

This are the Infos the KI would provide:

Describe the bug

The latest PyTorch nightly ROCm 7.2 wheel imports and starts correctly on an AMD Radeon RX Vega 56 (gfx900), but crashes during model execution with a rocBLAS error.

Error

rocBLAS error: Cannot read /media/HDD/programs/AI/ComfyUI/venv-linux/lib/python3.12/site-packages/torch/lib/rocblas/library/TensileLibrary.dat: Illegal seek for GPU arch : gfx900
List of available TensileLibrary Files :
".../TensileLibrary_lazy_gfx908.dat"
".../TensileLibrary_lazy_gfx1030.dat"
".../TensileLibrary_lazy_gfx90a.dat"
".../TensileLibrary_lazy_gfx1100.dat"
".../TensileLibrary_lazy_gfx1201.dat"
".../TensileLibrary_lazy_gfx1101.dat"
".../TensileLibrary_lazy_gfx1151.dat"
".../TensileLibrary_lazy_gfx942.dat"
".../TensileLibrary_lazy_gfx950.dat"
".../TensileLibrary_lazy_gfx1200.dat"
".../TensileLibrary_lazy_gfx1150.dat"
".../TensileLibrary_lazy_gfx1102.dat"
Aborted (core dumped)

Environment

  • OS: Linux Mint / Ubuntu-based Linux
  • Python: 3.12.3
  • GPU: AMD Radeon RX Vega 56
  • GPU arch: gfx900
  • PyTorch: 2.12.0.dev20260408+rocm7.2
  • ROCm reported by torch: 7.2

Reproduction

  1. Create a fresh virtual environment.
  2. Install nightly ROCm wheel:
    pip install --force-reinstall --no-cache-dir --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm7.2
  3. Start workload on GPU.
  4. Model load begins successfully, then rocBLAS aborts with: TensileLibrary.dat: Illegal seek for GPU arch : gfx900

Notes

  • torch detects the device successfully.
  • ComfyUI starts and sees the Vega 56 correctly.
  • The failure happens when rocBLAS tries to load Tensile libraries.
  • The wheel contains libraries for newer architectures (gfx908, gfx90a, gfx1030, gfx11xx, gfx12xx) but not gfx900.
  • This looks like a packaging/regression issue in the nightly wheel rather than a basic detection problem.

Expected behavior

If gfx900 is still intended to work in this wheel, the required Tensile library files should be present. If gfx900 is no longer supported, the wheel should fail gracefully with a clear unsupported-architecture message instead of aborting.

Versions

curl -sL https://raw.githubusercontent.com/pytorch/pytorch/main/torch/utils/collect_env.py | python
Collecting environment information...
PyTorch version: 2.12.0.dev20260408+rocm7.2
Is debug build: False
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: 7.2.53211

OS: Linux Mint 22.3 (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
Clang version: Could not collect
CMake version: version 3.28.3
Libc version: glibc-2.39

Python version: 3.12.3 (main, Mar  3 2026, 12:15:18) [GCC 13.3.0] (64-bit runtime)
Python platform: Linux-6.8.0-107-generic-x86_64-with-glibc2.39
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: 
GPU models and configuration: AMD Radeon RX Vega (gfx900:xnack-)
Nvidia driver version: Could not collect
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: 7.2.53211
MIOpen runtime version: 3.5.1
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Architektur:                             x86_64
CPU Operationsmodus:                     32-bit, 64-bit
Adressgrößen:                            43 bits physical, 48 bits virtual
Byte-Reihenfolge:                        Little Endian
CPU(s):                                  12
Liste der Online-CPU(s):                 0-11
Anbieterkennung:                         AuthenticAMD
Modellname:                              AMD Ryzen 5 2600 Six-Core Processor
Prozessorfamilie:                        23
Modell:                                  8
Thread(s) pro Kern:                      2
Kern(e) pro Sockel:                      6
Sockel:                                  1
Stepping:                                2
Übertaktung:                             aktiviert
Skalierung der CPU(s):                   108%
Maximale Taktfrequenz der CPU:           3400,0000
Minimale Taktfrequenz der CPU:           1550,0000
BogoMIPS:                                6800,15
Markierungen:                            fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca sev sev_es ibpb_exit_to_user
Virtualisierung:                         AMD-V
L1d Cache:                               192 KiB (6 Instanzen)
L1i Cache:                               384 KiB (6 Instanzen)
L2 Cache:                                3 MiB (6 Instanzen)
L3 Cache:                                16 MiB (2 Instanzen)
NUMA-Knoten:                             1
NUMA-Knoten0 CPU(s):                     0-11
Schwachstelle Gather data sampling:      Not affected
Schwachstelle Indirect target selection: Not affected
Schwachstelle Itlb multihit:             Not affected
Schwachstelle L1tf:                      Not affected
Schwachstelle Mds:                       Not affected
Schwachstelle Meltdown:                  Not affected
Schwachstelle Mmio stale data:           Not affected
Schwachstelle Reg file data sampling:    Not affected
Schwachstelle Retbleed:                  Mitigation; untrained return thunk; SMT vulnerable
Schwachstelle Spec rstack overflow:      Mitigation; Safe RET
Schwachstelle Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Schwachstelle Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Schwachstelle Spectre v2:                Mitigation; Retpolines; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Schwachstelle Srbds:                     Not affected
Schwachstelle Tsa:                       Not affected
Schwachstelle Tsx async abort:           Not affected
Schwachstelle Vmscape:                   Mitigation; IBPB before exit to userspace

Versions of relevant libraries:
[pip3] numpy==2.4.4
[pip3] nvidia-cublas==13.1.0.3
[pip3] nvidia-cuda-cupti==13.0.85
[pip3] nvidia-cuda-nvrtc==13.0.88
[pip3] nvidia-cuda-runtime==13.0.96
[pip3] nvidia-cudnn-cu13==9.19.0.56
[pip3] nvidia-cufft==12.0.0.61
[pip3] nvidia-curand==10.4.0.35
[pip3] nvidia-cusolver==12.0.4.66
[pip3] nvidia-cusparse==12.6.3.3
[pip3] nvidia-cusparselt-cu13==0.8.0
[pip3] nvidia-nccl-cu13==2.28.9
[pip3] nvidia-nvjitlink==13.0.88
[pip3] nvidia-nvtx==13.0.85
[pip3] torch==2.12.0.dev20260408+rocm7.2
[pip3] torchaudio==2.11.0.dev20260409+rocm7.2
[pip3] torchsde==0.2.6
[pip3] torchvision==0.27.0.dev20260409+rocm7.2
[pip3] triton-rocm==3.7.0+git282c8251
[conda] Could not collect
(venv-linux) jokergermany@huppyryzen:/media/HDD/programs/AI/ComfyUI$

cc @seemethere @malfet @atalman @tinglvv @nWEIdia @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang

extent analysis

TL;DR

The most likely fix is to use a PyTorch version that includes support for the gfx900 architecture or to use a different GPU that is supported by the current PyTorch version.

Guidance

  • The error message indicates that the TensileLibrary.dat file is not compatible with the gfx900 architecture, suggesting a packaging or regression issue in the nightly wheel.
  • The list of available Tensile library files does not include one for the gfx900 architecture, which is required for the AMD Radeon RX Vega 56 GPU.
  • To resolve the issue, you can try installing an older version of PyTorch that includes support for the gfx900 architecture or using a different GPU that is supported by the current PyTorch version.
  • You can also try to manually add the required Tensile library file for the gfx900 architecture to the PyTorch installation, but this may require technical expertise and is not recommended.

Example

No code snippet is provided as the issue is related to a packaging or regression problem in the PyTorch nightly wheel.

Notes

  • The gfx900 architecture is not supported in the current PyTorch nightly wheel, which may indicate a regression issue.
  • The error message suggests that the TensileLibrary.dat file is not compatible with the gfx900 architecture, which is required for the AMD Radeon RX Vega 56 GPU.
  • The issue may be resolved by using an older version of PyTorch that includes support for the gfx900 architecture or by using a different GPU that is supported by the current PyTorch version.

Recommendation

Apply workaround: Use an older version of PyTorch that includes support for the gfx900 architecture or use a different GPU that is supported by the current PyTorch version. This is because the current PyTorch nightly wheel does not include support for the gfx900 architecture, which is required for the AMD Radeon RX Vega 56 GPU.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

If gfx900 is still intended to work in this wheel, the required Tensile library files should be present. If gfx900 is no longer supported, the wheel should fail gracefully with a clear unsupported-architecture message instead of aborting.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING