pytorch - 💡(How to fix) Fix Memory management, memory not returned after CPU -> GPU [1 participants]

pytorch2026-03-16 14:16:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#177520•Fetched 2026-04-08 00:47:35

View on GitHub

Comments

Participants

Timeline

Reactions

Author

MartinPerry

Participants

MartinPerry

Timeline (top)

mentioned ×26subscribed ×26labeled ×6

Code Example

void PrintMemory(const char* label)
{
    PROCESS_MEMORY_COUNTERS_EX info{};
    GetProcessMemoryInfo(GetCurrentProcess(), reinterpret_cast<PROCESS_MEMORY_COUNTERS*>(&info), sizeof(info));
    printf("PrivateUsage (committed pages): %.3f GB\n",  (float)info.PrivateUsage / 1024 / 1024 / 1024);
}

---

PrintMemory("Before");
auto tensor = torch::zeros({ 16384, 16384, 4 });  // also  4GB float32
PrintMemory("After Init");        
tensor = tensor.to(torch::kCUDA);
PrintMemory("After Cuda");
c10::cuda::CUDACachingAllocator::emptyCache();
PrintMemory("After emptyCache");

---

[Before]
  PrivateUsage       (committed pages): 0.700 GB
[After Init]
  PrivateUsage       (committed pages): 4.716 GB
[After Cuda]
  PrivateUsage       (committed pages): 4.952 GB
[After emptyCache]
  PrivateUsage       (committed pages): 4.952 GB

---

PrintMemory("Before");
auto tensor = torch::zeros({ 16384, 16384, 4 });  // also  4GB float32
PrintMemory("After Init");        
tensor.reset();
PrintMemory("After reset");

---

[Before]
  PrivateUsage       (committed pages): 0.701 GB
[After Init]
  PrivateUsage       (committed pages): 4.716 GB
[After Reset]
  PrivateUsage       (committed pages): 0.701 GB

RAW_BUFFERClick to expand / collapse

I have a simple C++ libtorch (2.10.0) program:

void PrintMemory(const char* label)
{
    PROCESS_MEMORY_COUNTERS_EX info{};
    GetProcessMemoryInfo(GetCurrentProcess(), reinterpret_cast<PROCESS_MEMORY_COUNTERS*>(&info), sizeof(info));
    printf("PrivateUsage (committed pages): %.3f GB\n",  (float)info.PrivateUsage / 1024 / 1024 / 1024);
}

PrintMemory("Before");
auto tensor = torch::zeros({ 16384, 16384, 4 });  // also  4GB float32
PrintMemory("After Init");        
tensor = tensor.to(torch::kCUDA);
PrintMemory("After Cuda");
c10::cuda::CUDACachingAllocator::emptyCache();
PrintMemory("After emptyCache");


[Before]
  PrivateUsage       (committed pages): 0.700 GB
[After Init]
  PrivateUsage       (committed pages): 4.716 GB
[After Cuda]
  PrivateUsage       (committed pages): 4.952 GB
[After emptyCache]
  PrivateUsage       (committed pages): 4.952 GB

But CPU memory is nor returned.

However, when I do this:

PrintMemory("Before");
auto tensor = torch::zeros({ 16384, 16384, 4 });  // also  4GB float32
PrintMemory("After Init");        
tensor.reset();
PrintMemory("After reset");


[Before]
  PrivateUsage       (committed pages): 0.701 GB
[After Init]
  PrivateUsage       (committed pages): 4.716 GB
[After Reset]
  PrivateUsage       (committed pages): 0.701 GB

memory is returned.

Why in case of CUDA transfer, memory stays allocated (or seems to be allocated)? When I use the to CUDA transfer for large models (not a single Tensor like in this case), the RAM stays filled (or seems) and I cannot allocate more. GPU is also correctly allocated.

cc @peterjc123 @mszhanyi @skyline75489 @nbcsm @iremyux @Blackhex @jbschlosser @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia

extent analysis

Fix Plan

The issue arises from the fact that c10::cuda::CUDACachingAllocator::emptyCache() does not release the CPU memory allocated for the tensor before it was moved to CUDA. To fix this, you need to manually reset the tensor after moving it to CUDA.

Steps to Fix

Move the tensor to CUDA.
Reset the tensor to release the CPU memory.

PrintMemory("Before");
auto tensor = torch::zeros({ 16384, 16384, 4 });  
PrintMemory("After Init");        
tensor = tensor.to(torch::kCUDA);
PrintMemory("After Cuda");
tensor.reset(); // Add this line to release CPU memory
PrintMemory("After Reset");

Alternatively, you can use torch::Tensor's release_resources() method or reset() method before moving the tensor to CUDA, but this would require you to re-allocate the tensor on the CUDA device.

Verification

Run the modified code and verify that the CPU memory is released after moving the tensor to CUDA and resetting it. The PrintMemory function should show a decrease in private usage after the reset() call.

Extra Tips

Always remember to release resources when working with large tensors to avoid memory leaks.
Use torch::Tensor's reset() method to release resources when you're done using a tensor.
Be aware of the memory allocation and deallocation mechanisms in libtorch to avoid unexpected memory usage patterns.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #container setup #orchestration issue #cache issue #memory leak #API versioning

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix Memory management, memory not returned after CPU -> GPU [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

extent analysis

Fix Plan

Steps to Fix

Verification

Extra Tips

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix Memory management, memory not returned after CPU -> GPU [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

extent analysis

Fix Plan

Steps to Fix

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING