pytorch - 💡(How to fix) Fix [Windows] torch.save triggers 0xC0000005 Access Violation on RTX 4090 Laptop (WDDM Driver Conflict) [4 comments, 3 participants]

Fix Action

Fix / Workaround

Thread 0x0000754c (most recent call first): File "...torchstorage.py", line 828 in init File "...torch_tensor.py", line 287 in _typed_storage File "...torch_tensor.py", line 511 in _reduce_ex_internal File "...torchserialization.py", line 1190 in _save File "...torchserialization.py", line 944 in save File "...ultralyticsutilspatches.py", line 197 in torch_save File "...ultralyticsenginetrainer.py", line 633 in save_model

Additional Context Workaround: The issue disappears completely if torch.save is not called (e.g., setting save=False in the training arguments). Hypothesis: This appears to be a conflict between PyTorch's memory allocation during serialization and the Windows WDDM driver's memory management (TDR or memory paging) on high-end laptop GPUs. The "Access Violation" suggests a pointer invalidation during the GPU-to-CPU copy. Observation: This issue seems more prevalent on smaller models (e.g., YOLOv8s) where VRAM usage fluctuates, compared to larger models that saturate VRAM constantly.

🐛 Describe the bug

Bug Description

I am encountering a fatal Windows fatal exception: access violation (0xC0000005) when calling torch.save() during the training loop of a YOLO model (Ultralytics).

The crash occurs specifically during the serialization step (moving tensors from GPU to CPU for saving). Interestingly, if I disable the saving logic (using --no-save in Ultralytics), the training proceeds without any issues, suggesting the computation graph is stable, but the memory transfer during serialization is failing.

To Reproduce Hardware: Laptop with NVIDIA GeForce RTX 4090 (16GB VRAM). OS: Windows 11 (WDDM 3.1 Driver model). Code: Running standard training script (e.g., Ultralytics YOLO) which calls torch.save(model.state_dict(), ...) at the end of epochs. Trigger: The crash happens consistently when torch.save is invoked after a few epochs.

Expected Behavior torch.save should successfully serialize the model state to the disk without crashing the interpreter.

Error Log (Traceback) The crash stack trace points directly to the storage initialization during serialization: Windows fatal exception: access violation

Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 6/300 2.67G 0.8642 0.5596 0.01547 2 640: 100% Class Images Instances Box(P R mAP50 mAP50-95): 100% all 36 36 0.992 0.861 0.948 0.669

Process finished with exit code -1073741819 (0xC0000005)

Possible Solution Could the PyTorch team investigate if there is a race condition or pointer invalidation in torch.save when running under the Windows WDDM driver model? Perhaps adding a synchronization barrier (torch.cuda.synchronize()) before the copy or handling the WDDM memory paging differently could resolve this.

Versions PyTorch version: 2.6.0+cu124 Is debug build: False CUDA used to build PyTorch: 12.4 ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Home (10.0.26200 64-bit) Python version: 3.10.20 (64-bit runtime) Is CUDA available: True CUDA runtime version: 12.4.131 GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4090 Laptop GPU Nvidia driver version: 595.97 cuDNN version: C:Program FilesNVIDIA GPU Computing ToolkitCUDAv12.4bincudnn_ops64_9.dll CPU: Intel(R) Core(TM) i9-14900HX

Versions of relevant libraries: [pip3] numpy==2.2.6 [pip3] torch==2.6.0+cu124 [pip3] torchaudio==2.6.0+cu124 [pip3] torchvision==0.21.0+cu124

cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @iremyux @Blackhex @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia @mruberry @mikaylagawarecki

extent analysis

TL;DR

The most likely fix for the Windows fatal exception access violation when calling torch.save() is to add a synchronization barrier using torch.cuda.synchronize() before the GPU-to-CPU copy.

Guidance

Investigate the possibility of a race condition or pointer invalidation in torch.save() when running under the Windows WDDM driver model.
Consider adding a synchronization barrier using torch.cuda.synchronize() before the copy to ensure that all GPU operations are completed before transferring data to the CPU.
Verify that the issue is resolved by testing the training loop with the modified torch.save() call.
If the issue persists, try updating the NVIDIA driver to the latest version or experimenting with different CUDA and cuDNN versions.

Example

import torch

# ... (training loop)

# Add synchronization barrier before saving the model
torch.cuda.synchronize()
torch.save(model.state_dict(), 'model.pth')

Notes

The issue seems to be specific to the Windows WDDM driver model and high-end laptop GPUs, so the solution may not be applicable to all environments. Further investigation and testing are necessary to confirm the root cause and effectiveness of the proposed fix.

Recommendation

Apply the workaround by adding a synchronization barrier using torch.cuda.synchronize() before the GPU-to-CPU copy, as it is a relatively simple and non-invasive change that may resolve the issue.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix [Windows] torch.save triggers 0xC0000005 Access Violation on RTX 4090 Laptop (WDDM Driver Conflict) [4 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

🐛 Describe the bug

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix [Windows] torch.save triggers 0xC0000005 Access Violation on RTX 4090 Laptop (WDDM Driver Conflict) [4 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

🐛 Describe the bug

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING