pytorch - 💡(How to fix) Fix DISABLED test_memory_snapshot (__main__.TestCudaAllocator) [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#179740Fetched 2026-04-09 07:50:10
View on GitHub
Comments
1
Participants
1
Timeline
30
Reactions
0
Participants
Timeline (top)
mentioned ×12subscribed ×12labeled ×4closed ×1

Error Message

Traceback (most recent call last): File "/var/lib/jenkins/workspace/test/test_cuda.py", line 4233, in test_memory_snapshot torch.cuda.memory._save_segment_usage(f.name) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/memory.py", line 1166, in _save_segment_usage f.write(_segments(snapshot)) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/_memory_viz.py", line 161, in segments return format_flamegraph(f.getvalue()) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/_memory_viz.py", line 100, in format_flamegraph with tempfile.NamedTemporaryFile(mode="wb", suffix=".pl") as f: File "/opt/conda/envs/py_3.10/lib/python3.10/tempfile.py", line 518, in exit self.close() File "/opt/conda/envs/py_3.10/lib/python3.10/tempfile.py", line 525, in close self._closer.close() File "/opt/conda/envs/py_3.10/lib/python3.10/tempfile.py", line 462, in close unlink(self.name) FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpaqz3rvfr.pl'

Root Cause

This test was disabled because it is failing in CI. See recent examples and the most recent trunk workflow logs.

Code Example

Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/test/test_cuda.py", line 4233, in test_memory_snapshot
    torch.cuda.memory._save_segment_usage(f.name)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/memory.py", line 1166, in _save_segment_usage
    f.write(_segments(snapshot))
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/_memory_viz.py", line 161, in segments
    return format_flamegraph(f.getvalue())
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/_memory_viz.py", line 100, in format_flamegraph
    with tempfile.NamedTemporaryFile(mode="wb", suffix=".pl") as f:
  File "/opt/conda/envs/py_3.10/lib/python3.10/tempfile.py", line 518, in __exit__
    self.close()
  File "/opt/conda/envs/py_3.10/lib/python3.10/tempfile.py", line 525, in close
    self._closer.close()
  File "/opt/conda/envs/py_3.10/lib/python3.10/tempfile.py", line 462, in close
    unlink(self.name)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpaqz3rvfr.pl'
RAW_BUFFERClick to expand / collapse

Platforms: linux

This test was disabled because it is failing in CI. See recent examples and the most recent trunk workflow logs.

Over the past 6 hours, it has been determined flaky in 4 workflow(s) with 4 failures and 4 successes.

Debugging instructions (after clicking on the recent samples link): DO NOT ASSUME THINGS ARE OKAY IF THE CI IS GREEN. We now shield flaky tests from developers so CI will thus be green but it will be harder to parse the logs. To find relevant log snippets:

  1. Click on the workflow logs linked above
  2. Click on the Test step of the job so that it is expanded. Otherwise, the grepping will not work.
  3. Grep for test_memory_snapshot
  4. There should be several instances run (as flaky tests are rerun in CI) from which you can study the logs.
<details><summary>Sample error message</summary>
Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/test/test_cuda.py", line 4233, in test_memory_snapshot
    torch.cuda.memory._save_segment_usage(f.name)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/memory.py", line 1166, in _save_segment_usage
    f.write(_segments(snapshot))
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/_memory_viz.py", line 161, in segments
    return format_flamegraph(f.getvalue())
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/cuda/_memory_viz.py", line 100, in format_flamegraph
    with tempfile.NamedTemporaryFile(mode="wb", suffix=".pl") as f:
  File "/opt/conda/envs/py_3.10/lib/python3.10/tempfile.py", line 518, in __exit__
    self.close()
  File "/opt/conda/envs/py_3.10/lib/python3.10/tempfile.py", line 525, in close
    self._closer.close()
  File "/opt/conda/envs/py_3.10/lib/python3.10/tempfile.py", line 462, in close
    unlink(self.name)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpaqz3rvfr.pl'
</details>

Test file path: test_cuda.py

For all disabled tests (by GitHub issue), see https://hud.pytorch.org/disabled.

cc @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia

extent analysis

TL;DR

The most likely fix involves addressing the FileNotFoundError exception when trying to close a temporary file in the tempfile module.

Guidance

  • Investigate the tempfile module usage in the torch/cuda/_memory_viz.py file to ensure proper handling of temporary files.
  • Verify that the /tmp directory is writable and not being cleaned up prematurely, causing the file to be deleted before it can be closed.
  • Check the test_memory_snapshot function in test_cuda.py to see if there are any issues with the way temporary files are being used or closed.
  • Review the CI workflow logs to ensure that the test environment is properly configured and that there are no issues with the test runner or the Python environment.

Example

No specific code snippet can be provided without modifying the existing codebase, but ensuring that temporary files are properly closed and that the /tmp directory is not being cleaned up prematurely can be done by adding error handling and logging to the tempfile usage.

Notes

The issue seems to be related to the usage of temporary files in the tempfile module, and addressing this issue may require changes to the torch/cuda/_memory_viz.py file or the test_cuda.py file.

Recommendation

Apply a workaround to ensure that temporary files are properly handled and closed, such as adding try-except blocks to handle the FileNotFoundError exception or modifying the tempfile usage to avoid deleting files prematurely.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING