pytorch - 💡(How to fix) Fix Expose public API for clearing cuBLAS workspaces (currently only private torch._C._cuda_clearCublasWorkspaces)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fix / Workaround

NVIDIA's cuda-checkpoint tool requires ALL GPU-side allocations to be released before checkpointing a process. After calling torch.cuda.empty_cache() and gc.collect(), cuBLAS workspaces still hold GPU memory, causing checkpoint failures. The only workaround is calling the private torch._C._cuda_clearCublasWorkspaces().

Current workaround

Code Example

# Private API - works but undocumented and could break
torch._C._cuda_clearCublasWorkspaces()
gc.collect()
torch.cuda.empty_cache()

---

torch.cuda.clear_cublas_workspaces()

---

torch.cuda.empty_cache(include_cublas_workspaces=True)
RAW_BUFFERClick to expand / collapse

🚀 The feature, motivation and pitch

torch._C._cuda_clearCublasWorkspaces() is the only way to release cuBLAS workspace allocations, but it's a private API. torch.cuda.empty_cache() explicitly does not free them (as documented in docs/source/notes/cuda.rst). There should be a public API for this.

Motivation

NVIDIA's cuda-checkpoint tool requires ALL GPU-side allocations to be released before checkpointing a process. After calling torch.cuda.empty_cache() and gc.collect(), cuBLAS workspaces still hold GPU memory, causing checkpoint failures. The only workaround is calling the private torch._C._cuda_clearCublasWorkspaces().

This is becoming a practical blocker as cuda-checkpoint adoption grows for GPU process migration and cold start optimization (see vLLM RFC #34303, cuda-checkpoint #4).

Beyond checkpoint/restore, any workflow that needs to fully reclaim GPU memory between models (multi-model serving, benchmarking, testing) hits this gap.

Current workaround

# Private API - works but undocumented and could break
torch._C._cuda_clearCublasWorkspaces()
gc.collect()
torch.cuda.empty_cache()

Proposed API

Either of these would work:

Option A: Dedicated public function

torch.cuda.clear_cublas_workspaces()

Option B: Flag on existing empty_cache()

torch.cuda.empty_cache(include_cublas_workspaces=True)

Option B has the advantage of giving users a single call for "release everything", which is what most people expect empty_cache() to do already.

Alternatives

Continue using the private torch._C._cuda_clearCublasWorkspaces(), but this is fragile and not discoverable. Users who need full GPU memory release (for cuda-checkpoint, multi-model serving, etc.) have to find this through source code or StackOverflow.

Additional context

  • PyTorch's own test utilities (torch/testing/_internal/common_utils.py) already use clearCublasWorkspaces() for clean test state
  • The function is documented in docs/source/notes/cuda.rst but only as a private API
  • Related issues about incomplete memory release: #17157, #46602, #173382, #20837

cc @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia @csarofeen

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix Expose public API for clearing cuBLAS workspaces (currently only private torch._C._cuda_clearCublasWorkspaces)