pytorch - 💡(How to fix) Fix Can't allocate 15+GB on 24GB GPU (RTX 3090) (reason: fragmented usage from other apps) [3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#178057Fetched 2026-04-08 01:12:24
View on GitHub
Comments
3
Participants
2
Timeline
36
Reactions
0
Author
Participants
Timeline (top)
mentioned ×12subscribed ×12labeled ×6commented ×3

Error Message

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 14.90 GiB. GPU 0 has a total capacity of 24.00 GiB of which 22.79 GiB is free. Of the allocated memory 0 bytes is allocated by PyTorch, and 0 bytes is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Code Example

torch.OutOfMemoryError: CUDA out of memory.
Tried to allocate 14.90 GiB. GPU 0 has a total capacity of 24.00 GiB of which 22.79 GiB is free.
Of the allocated memory 0 bytes is allocated by PyTorch, and 0 bytes is reserved by PyTorch but unallocated.
If reserved but unallocated memory is large try setting PYTORCH_ALLOC_CONF=expandable_segments:True to avoid fragmentation. 
See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

---

import time
import torch


def main():
    device = torch.accelerator.current_accelerator(True)
    matrix_8GB = torch.randn((round(14.9 * 1024) * (1024 // 4) * 1024), device=device, dtype=torch.float32)
    # CRASHING
    # Get the current GPU memory usage
    current_memory = torch.accelerator.memory_allocated()
    print(f"Current GPU memory allocated: {current_memory / (1024 ** 3):.2f} GB")
    
    # Get the maximum GPU memory allocated during the program's execution
    max_memory = torch.accelerator.max_memory_allocated()
    print(f"Maximum GPU memory allocated: {max_memory / (1024 ** 3):.2f} GB")
    time.sleep(10)


if __name__ == "__main__":
    main()

---

PyTorch version: 2.10.0+cu130
Is debug build: False
CUDA used to build PyTorch: 13.0
ROCM used to build PyTorch: N/A

OS: D:\Users\Sergey\Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1
{
    "Caption":  "Microsoft Windows 11 Enterprise",
    "OSArchitecture":  "64-bit",
    "Version":  "10.0.26200"
}
Expecting value: line 1 column 1 (char 0)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.11.15 (main, Mar  3 2026, 14:55:34) [MSC v.1944 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.26200-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: 
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090
Nvidia driver version: 595.79
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
D:\Users\Sergey\Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1
{
    "Name":  "AMD Ryzen 9 9900X 12-Core Processor            ",
    "Manufacturer":  "AuthenticAMD",
    "Family":  107,
    "Architecture":  9,
    "ProcessorType":  3,
    "DeviceID":  "CPU0",
    "CurrentClockSpeed":  4400,
    "MaxClockSpeed":  4400,
    "L2CacheSize":  12288,
    "L2CacheSpeed":  null,
    "Revision":  17408
}
Expecting value: line 1 column 1 (char 0)

Versions of relevant libraries:
[pip3] Could not collect
[conda] Could not collect
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.OutOfMemoryError: CUDA out of memory.
Tried to allocate 14.90 GiB. GPU 0 has a total capacity of 24.00 GiB of which 22.79 GiB is free.
Of the allocated memory 0 bytes is allocated by PyTorch, and 0 bytes is reserved by PyTorch but unallocated.
If reserved but unallocated memory is large try setting PYTORCH_ALLOC_CONF=expandable_segments:True to avoid fragmentation. 
See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Code:

import time
import torch


def main():
    device = torch.accelerator.current_accelerator(True)
    matrix_8GB = torch.randn((round(14.9 * 1024) * (1024 // 4) * 1024), device=device, dtype=torch.float32)
    # CRASHING
    # Get the current GPU memory usage
    current_memory = torch.accelerator.memory_allocated()
    print(f"Current GPU memory allocated: {current_memory / (1024 ** 3):.2f} GB")
    
    # Get the maximum GPU memory allocated during the program's execution
    max_memory = torch.accelerator.max_memory_allocated()
    print(f"Maximum GPU memory allocated: {max_memory / (1024 ** 3):.2f} GB")
    time.sleep(10)


if __name__ == "__main__":
    main()

Versions

PyTorch version: 2.10.0+cu130
Is debug build: False
CUDA used to build PyTorch: 13.0
ROCM used to build PyTorch: N/A

OS: D:\Users\Sergey\Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1
{
    "Caption":  "Microsoft Windows 11 Enterprise",
    "OSArchitecture":  "64-bit",
    "Version":  "10.0.26200"
}
Expecting value: line 1 column 1 (char 0)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.11.15 (main, Mar  3 2026, 14:55:34) [MSC v.1944 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.26200-SP0
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: 
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090
Nvidia driver version: 595.79
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
D:\Users\Sergey\Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1
{
    "Name":  "AMD Ryzen 9 9900X 12-Core Processor            ",
    "Manufacturer":  "AuthenticAMD",
    "Family":  107,
    "Architecture":  9,
    "ProcessorType":  3,
    "DeviceID":  "CPU0",
    "CurrentClockSpeed":  4400,
    "MaxClockSpeed":  4400,
    "L2CacheSize":  12288,
    "L2CacheSpeed":  null,
    "Revision":  17408
}
Expecting value: line 1 column 1 (char 0)

Versions of relevant libraries:
[pip3] Could not collect
[conda] Could not collect

cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @iremyux @Blackhex @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia

extent analysis

Fix Plan

To resolve the torch.OutOfMemoryError: CUDA out of memory issue, we need to optimize memory allocation and deallocation in the PyTorch code.

Here are the steps:

  • Set the PYTORCH_ALLOC_CONF environment variable to expandable_segments:True to avoid memory fragmentation.
  • Use torch.cuda.empty_cache() to release unused GPU memory.
  • Consider reducing the size of the tensor or using a more memory-efficient data type.

Code Changes

import os
import torch
import time

# Set environment variable to avoid memory fragmentation
os.environ['PYTORCH_ALLOC_CONF'] = 'expandable_segments:True'

def main():
    device = torch.accelerator.current_accelerator(True)
    # Reduce the size of the tensor to avoid out-of-memory error
    matrix_8GB = torch.randn((round(10 * 1024) * (1024 // 4) * 1024), device=device, dtype=torch.float32)
    
    # Get the current GPU memory usage
    current_memory = torch.accelerator.memory_allocated()
    print(f"Current GPU memory allocated: {current_memory / (1024 ** 3):.2f} GB")
    
    # Get the maximum GPU memory allocated during the program's execution
    max_memory = torch.accelerator.max_memory_allocated()
    print(f"Maximum GPU memory allocated: {max_memory / (1024 ** 3):.2f} GB")
    
    # Release unused GPU memory
    torch.cuda.empty_cache()
    
    time.sleep(10)

if __name__ == "__main__":
    main()

Verification

To verify that the fix worked, run the modified code and check that it no longer throws a torch.OutOfMemoryError. You can also monitor the GPU memory usage using tools like nvidia-smi to ensure that the memory allocation and deallocation are working as expected.

Extra Tips

  • Always set the PYTORCH_ALLOC_CONF environment variable to expandable_segments:True when working with large tensors to avoid memory fragmentation.
  • Use torch.cuda.empty_cache() regularly to release unused GPU memory and prevent memory leaks.
  • Consider using more memory-efficient data types, such as torch.float16 or torch.bfloat16, to reduce memory usage.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING