pytorch - 💡(How to fix) Fix [Windows] torch.save triggers 0xC0000005 Access Violation on RTX 4090 Laptop (WDDM Driver Conflict) [4 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#178892Fetched 2026-04-08 01:57:06
View on GitHub
Comments
4
Participants
3
Timeline
98
Reactions
0
Timeline (top)
mentioned ×43subscribed ×43labeled ×8commented ×4

Error Message

I am encountering a fatal Windows fatal exception: access violation (0xC0000005) when calling torch.save() during the training loop of a YOLO model (Ultralytics). Error Log (Traceback) Windows fatal exception: access violation

Fix Action

Fix / Workaround

Thread 0x0000754c (most recent call first): File "...torchstorage.py", line 828 in init File "...torch_tensor.py", line 287 in _typed_storage File "...torch_tensor.py", line 511 in _reduce_ex_internal File "...torchserialization.py", line 1190 in _save File "...torchserialization.py", line 944 in save File "...ultralyticsutilspatches.py", line 197 in torch_save File "...ultralyticsenginetrainer.py", line 633 in save_model

Additional Context Workaround: The issue disappears completely if torch.save is not called (e.g., setting save=False in the training arguments). Hypothesis: This appears to be a conflict between PyTorch's memory allocation during serialization and the Windows WDDM driver's memory management (TDR or memory paging) on high-end laptop GPUs. The "Access Violation" suggests a pointer invalidation during the GPU-to-CPU copy. Observation: This issue seems more prevalent on smaller models (e.g., YOLOv8s) where VRAM usage fluctuates, compared to larger models that saturate VRAM constantly.

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

Bug Description

I am encountering a fatal Windows fatal exception: access violation (0xC0000005) when calling torch.save() during the training loop of a YOLO model (Ultralytics).

The crash occurs specifically during the serialization step (moving tensors from GPU to CPU for saving). Interestingly, if I disable the saving logic (using --no-save in Ultralytics), the training proceeds without any issues, suggesting the computation graph is stable, but the memory transfer during serialization is failing.

To Reproduce Hardware: Laptop with NVIDIA GeForce RTX 4090 (16GB VRAM). OS: Windows 11 (WDDM 3.1 Driver model). Code: Running standard training script (e.g., Ultralytics YOLO) which calls torch.save(model.state_dict(), ...) at the end of epochs. Trigger: The crash happens consistently when torch.save is invoked after a few epochs.

Expected Behavior torch.save should successfully serialize the model state to the disk without crashing the interpreter.

Error Log (Traceback) The crash stack trace points directly to the storage initialization during serialization: Windows fatal exception: access violation

Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 6/300 2.67G 0.8642 0.5596 0.01547 2 640: 100% Class Images Instances Box(P R mAP50 mAP50-95): 100% all 36 36 0.992 0.861 0.948 0.669

Thread 0x0000754c (most recent call first): File "...torchstorage.py", line 828 in init File "...torch_tensor.py", line 287 in _typed_storage File "...torch_tensor.py", line 511 in _reduce_ex_internal File "...torchserialization.py", line 1190 in _save File "...torchserialization.py", line 944 in save File "...ultralyticsutilspatches.py", line 197 in torch_save File "...ultralyticsenginetrainer.py", line 633 in save_model

Process finished with exit code -1073741819 (0xC0000005)

Additional Context Workaround: The issue disappears completely if torch.save is not called (e.g., setting save=False in the training arguments). Hypothesis: This appears to be a conflict between PyTorch's memory allocation during serialization and the Windows WDDM driver's memory management (TDR or memory paging) on high-end laptop GPUs. The "Access Violation" suggests a pointer invalidation during the GPU-to-CPU copy. Observation: This issue seems more prevalent on smaller models (e.g., YOLOv8s) where VRAM usage fluctuates, compared to larger models that saturate VRAM constantly.

Possible Solution Could the PyTorch team investigate if there is a race condition or pointer invalidation in torch.save when running under the Windows WDDM driver model? Perhaps adding a synchronization barrier (torch.cuda.synchronize()) before the copy or handling the WDDM memory paging differently could resolve this.

Versions PyTorch version: 2.6.0+cu124 Is debug build: False CUDA used to build PyTorch: 12.4 ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Home (10.0.26200 64-bit) Python version: 3.10.20 (64-bit runtime) Is CUDA available: True CUDA runtime version: 12.4.131 GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4090 Laptop GPU Nvidia driver version: 595.97 cuDNN version: C:Program FilesNVIDIA GPU Computing ToolkitCUDAv12.4bincudnn_ops64_9.dll CPU: Intel(R) Core(TM) i9-14900HX

Versions of relevant libraries: [pip3] numpy==2.2.6 [pip3] torch==2.6.0+cu124 [pip3] torchaudio==2.6.0+cu124 [pip3] torchvision==0.21.0+cu124

cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @iremyux @Blackhex @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia @mruberry @mikaylagawarecki

extent analysis

TL;DR

The most likely fix for the Windows fatal exception access violation when calling torch.save() is to add a synchronization barrier using torch.cuda.synchronize() before the GPU-to-CPU copy.

Guidance

  • Investigate the possibility of a race condition or pointer invalidation in torch.save() when running under the Windows WDDM driver model.
  • Consider adding a synchronization barrier using torch.cuda.synchronize() before the copy to ensure that all GPU operations are completed before transferring data to the CPU.
  • Verify that the issue is resolved by testing the training loop with the modified torch.save() call.
  • If the issue persists, try updating the NVIDIA driver to the latest version or experimenting with different CUDA and cuDNN versions.

Example

import torch

# ... (training loop)

# Add synchronization barrier before saving the model
torch.cuda.synchronize()
torch.save(model.state_dict(), 'model.pth')

Notes

The issue seems to be specific to the Windows WDDM driver model and high-end laptop GPUs, so the solution may not be applicable to all environments. Further investigation and testing are necessary to confirm the root cause and effectiveness of the proposed fix.

Recommendation

Apply the workaround by adding a synchronization barrier using torch.cuda.synchronize() before the GPU-to-CPU copy, as it is a relatively simple and non-invasive change that may resolve the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING