pytorch - 💡(How to fix) Fix [Bug] ValueError: I/O operation on closed file during torch.save() on Windows (PyTorch Nightly + CUDA 13.2) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#180458Fetched 2026-04-17 08:22:25
View on GitHub
Comments
0
Participants
1
Timeline
30
Reactions
0
Author
Participants
Timeline (top)
mentioned ×12subscribed ×12labeled ×6

Error Message

Traceback (most recent call last): File "[...].env\Lib\site-packages\torch\serialization.py", line 1004, in save _save( obj, ...<3 lines>... _disable_byteorder_record, ) File "[...].env\Lib\site-packages\torch\serialization.py", line 1260, in _save zip_file.write_record("data.pkl", data_value, len(data_value)) ValueError: I/O operation on closed file.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "[...]\main.py", line 16, in <module> model.train( data="./[...]/data.yaml", ...<12 lines>... name="RTX3080_Cuda13_2" ) File "[...].env\Lib\site-packages\ultralytics\engine\model.py", line 787, in train self.trainer.train() File "[...].env\Lib\site-packages\ultralytics\engine\trainer.py", line 246, in train self._do_train() File "[...].env\Lib\site-packages\ultralytics\engine\trainer.py", line 535, in _do_train if (self.args.save or final_epoch) and self.save_model(): File "[...].env\Lib\site-packages\ultralytics\engine\trainer.py", line 643, in save_model torch.save( { ...<21 lines>... buffer, ) File "[...].env\Lib\site-packages\ultralytics\utils\patches.py", line 197, in torch_save return _torch_save(*args, **kwargs) File "[...].env\Lib\site-packages\torch\serialization.py", line 1003, in save with _open_zipfile_writer(f) as opened_zipfile: File "[...].env\Lib\site-packages\torch\serialization.py", line 855, in exit self.file_like.write_end_of_file() ValueError: I/O operation on closed file.

Fix Action

Fix / Workaround

Traceback (most recent call last): File "[...]\main.py", line 16, in <module> model.train( data="./[...]/data.yaml", ...<12 lines>... name="RTX3080_Cuda13_2" ) File "[...].env\Lib\site-packages\ultralytics\engine\model.py", line 787, in train self.trainer.train() File "[...].env\Lib\site-packages\ultralytics\engine\trainer.py", line 246, in train self._do_train() File "[...].env\Lib\site-packages\ultralytics\engine\trainer.py", line 535, in _do_train if (self.args.save or final_epoch) and self.save_model(): File "[...].env\Lib\site-packages\ultralytics\engine\trainer.py", line 643, in save_model torch.save( { ...<21 lines>... buffer, ) File "[...].env\Lib\site-packages\ultralytics\utils\patches.py", line 197, in torch_save return _torch_save(*args, **kwargs) File "[...].env\Lib\site-packages\torch\serialization.py", line 1003, in save with _open_zipfile_writer(f) as opened_zipfile: File "[...].env\Lib\site-packages\torch\serialization.py", line 855, in exit self.file_like.write_end_of_file() ValueError: I/O operation on closed file.

Code Example

from ultralytics import YOLO

if __name__ == '__main__':
    model = YOLO('yolo26n.pt') 
    model.train(
        data="data.yaml",
        epochs=100,
        imgsz=640,
        device=0,
        workers=4, # Issue might be related to Windows DataLoader workers
    )

---

Traceback (most recent call last):
  File "[...]\.env\Lib\site-packages\torch\serialization.py", line 1004, in save
    _save(
        obj,
    ...<3 lines>...
        _disable_byteorder_record,
    )
  File "[...]\.env\Lib\site-packages\torch\serialization.py", line 1260, in _save
    zip_file.write_record("data.pkl", data_value, len(data_value))
ValueError: I/O operation on closed file.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "[...]\main.py", line 16, in <module>
    model.train(
        data="./[...]/data.yaml",
    ...<12 lines>...
        name="RTX3080_Cuda13_2"
    )
  File "[...]\.env\Lib\site-packages\ultralytics\engine\model.py", line 787, in train
    self.trainer.train()
  File "[...]\.env\Lib\site-packages\ultralytics\engine\trainer.py", line 246, in train
    self._do_train()
  File "[...]\.env\Lib\site-packages\ultralytics\engine\trainer.py", line 535, in _do_train
    if (self.args.save or final_epoch) and self.save_model():
  File "[...]\.env\Lib\site-packages\ultralytics\engine\trainer.py", line 643, in save_model
    torch.save(
        {
    ...<21 lines>...
        buffer,
    )
  File "[...]\.env\Lib\site-packages\ultralytics\utils\patches.py", line 197, in torch_save
    return _torch_save(*args, **kwargs)
  File "[...]\.env\Lib\site-packages\torch\serialization.py", line 1003, in save
    with _open_zipfile_writer(f) as opened_zipfile:
  File "[...]\.env\Lib\site-packages\torch\serialization.py", line 855, in __exit__
    self.file_like.write_end_of_file()
ValueError: I/O operation on closed file.
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

When attempting to save model checkpoints (via torch.save()) during a training loop on Windows using PyTorch 2.12 Nightly (with CUDA 13.2), the process randomly crashes with a ValueError: I/O operation on closed file inside torch.serialization.py.

The crash happens specifically when zip_file.write_record() attempts to write data to the .pt archive. This seems to be a race condition or a file-lock issue related to the new ZipFile writer implementation in the nightly build, possibly exacerbated by Windows multiprocessing/I/O handling.

To Reproduce Steps to reproduce the behavior:

  1. Install the PyTorch Nightly build with CUDA 13.2 support on a Windows machine: pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu132
  2. Start a training loop using the ultralytics package (which calls torch.save() at the end of each epoch to save last.pt).
  3. The training loop runs normally for the first epochs, but crashes during the file serialization process.

Minimal Code Context:

from ultralytics import YOLO

if __name__ == '__main__':
    model = YOLO('yolo26n.pt') 
    model.train(
        data="data.yaml",
        epochs=100,
        imgsz=640,
        device=0,
        workers=4, # Issue might be related to Windows DataLoader workers
    )

Expected behavior torch.save() should successfully write the .pt archive without throwing an I/O Exception or dropping the file lock prematurely.

Error logs / Traceback

Traceback (most recent call last):
  File "[...]\.env\Lib\site-packages\torch\serialization.py", line 1004, in save
    _save(
        obj,
    ...<3 lines>...
        _disable_byteorder_record,
    )
  File "[...]\.env\Lib\site-packages\torch\serialization.py", line 1260, in _save
    zip_file.write_record("data.pkl", data_value, len(data_value))
ValueError: I/O operation on closed file.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "[...]\main.py", line 16, in <module>
    model.train(
        data="./[...]/data.yaml",
    ...<12 lines>...
        name="RTX3080_Cuda13_2"
    )
  File "[...]\.env\Lib\site-packages\ultralytics\engine\model.py", line 787, in train
    self.trainer.train()
  File "[...]\.env\Lib\site-packages\ultralytics\engine\trainer.py", line 246, in train
    self._do_train()
  File "[...]\.env\Lib\site-packages\ultralytics\engine\trainer.py", line 535, in _do_train
    if (self.args.save or final_epoch) and self.save_model():
  File "[...]\.env\Lib\site-packages\ultralytics\engine\trainer.py", line 643, in save_model
    torch.save(
        {
    ...<21 lines>...
        buffer,
    )
  File "[...]\.env\Lib\site-packages\ultralytics\utils\patches.py", line 197, in torch_save
    return _torch_save(*args, **kwargs)
  File "[...]\.env\Lib\site-packages\torch\serialization.py", line 1003, in save
    with _open_zipfile_writer(f) as opened_zipfile:
  File "[...]\.env\Lib\site-packages\torch\serialization.py", line 855, in __exit__
    self.file_like.write_end_of_file()
ValueError: I/O operation on closed file.

**Additional context ** Setting workers=0 inside the Ultralytics DataLoader seems to mitigate/bypass the issue, suggesting a potential clash between Windows multi-processing and the new ZipFile writer implementation when acquiring or releasing the file handle.

Versions

PyTorch version: 2.12.0.dev20260415+cu132 Is debug build: False CUDA used to build PyTorch: 13.2 ROCM used to build PyTorch: N/A

OS: Microsoft Windows 11 Pro (10.0.26200 64 bit) GCC version: Could not collect Clang version: Could not collect CMake version: Could not collect Libc version: N/A

Python version: 3.14.3 (tags/v3.14.3:323c59a, Feb 3 2026, 16:04:56) [MSC v.1944 64 bit (AMD64)] (64-bit runtime) Python platform: Windows-11-10.0.26200-SP0 Is CUDA available: True CUDA runtime version: 13.2.78 CUDA_MODULE_LOADING set to: GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080 Nvidia driver version: 595.97 cuDNN version: Could not collect Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Caching allocator config: N/A

CPU: Name: AMD Ryzen 7 5800X 8-Core Processor
Manufacturer: AuthenticAMD Family: 107 Architecture: 9 ProcessorType: 3 DeviceID: CPU0 CurrentClockSpeed: 3801 MaxClockSpeed: 3801 L2CacheSize: 4096 L2CacheSpeed: None Revision: 8448

Versions of relevant libraries: [pip3] numpy==2.4.4 [pip3] torch==2.12.0.dev20260415+cu132 [pip3] torchaudio==2.11.0 [pip3] torchvision==0.27.0.dev20260414+cu132 [conda] Could not collect

cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @iremyux @Blackhex @VitalyFedyunin @albanD @pragupta @ppwwyyxx @mruberry @mikaylagawarecki

extent analysis

TL;DR

Setting workers=0 in the Ultralytics DataLoader may mitigate the issue by avoiding the potential clash between Windows multi-processing and the new ZipFile writer implementation.

Guidance

  • Verify that the issue is indeed related to the number of workers by testing with different values for workers in the model.train() function.
  • Consider downgrading PyTorch to a stable version or waiting for a fix in a future nightly build, as the issue seems to be related to the new ZipFile writer implementation.
  • If the issue persists, try to reproduce it with a minimal example using only PyTorch and without the Ultralytics library to isolate the problem.
  • Check if there are any open issues or discussions on the PyTorch or Ultralytics GitHub pages related to this problem.

Example

No code example is provided as the issue seems to be related to a specific implementation detail in the PyTorch library.

Notes

The issue might be specific to the combination of PyTorch 2.12 Nightly, CUDA 13.2, and Windows, so the solution may not apply to other environments.

Recommendation

Apply the workaround by setting workers=0 in the Ultralytics DataLoader, as it seems to mitigate the issue. This is a temporary solution until a fix is available in a future PyTorch version.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING