pytorch - 💡(How to fix) Fix SystemError: <built-in method apply of FunctionMeta> returned NULL without setting an exception [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#179536Fetched 2026-04-08 03:00:35
View on GitHub
Comments
1
Participants
1
Timeline
45
Reactions
0
Author
Participants
Timeline (top)
mentioned ×18subscribed ×18labeled ×7closed ×1

Error Message

Traceback (most recent call last): File "/repro/repro_systemerror.py", line 28, in <module> model(x).sum().backward() File "/repro/.venv/lib/python3.10/site-packages/torch/_tensor.py", line 631, in backward torch.autograd.backward( File "/repro/.venv/lib/python3.10/site-packages/torch/autograd/init.py", line 381, in backward _engine_run_backward( File "/repro/.venv/lib/python3.10/site-packages/torch/autograd/graph.py", line 869, in _engine_run_backward return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass File "/repro/.venv/lib/python3.10/site-packages/torch/autograd/function.py", line 317, in apply return user_fn(self, *args) File "/repro/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2681, in backward else ctx.saved_tensors File "/repro/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 1173, in unpack_hook _run_fn_with_dynamo_disabled(frame.recompute_fn, *args) File "/repro/.venv/lib/python3.10/site-packages/torch/_compile.py", line 54, in inner return disable_fn(*args, **kwargs) File "/repro/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1263, in _fn return fn(*args, **kwargs) File "/repro/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 1139, in _run_fn_with_dynamo_disabled return fn(*args, **kwargs) File "/repro/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 1607, in recompute_fn fn(*args, **kwargs) File "/repro/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/repro/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl return forward_call(*args, **kwargs) File "/repro/repro_systemerror.py", line 19, in forward return x + self._epilogue(h, scores, idx, n).view_as(x) File "/repro/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1024, in compile_wrapper return fn(*args, **kwargs) File "/repro/repro_systemerror.py", line 9, in _epilogue def _epilogue(self, x, scores, indices, n): File "/repro/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1263, in _fn return fn(*args, **kwargs) File "/repro/.venv/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1200, in forward return compiled_fn(full_args) File "/repro/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2298, in call return self.compiled_fn(*args, **kwargs) File "/repro/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 566, in runtime_wrapper all_outs = call_func_at_runtime_with_args( File "/repro/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 138, in call_func_at_runtime_with_args out = normalize_as_list(f(args)) File "/repro/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 105, in g return f(*args) File "/repro/.venv/lib/python3.10/site-packages/torch/autograd/function.py", line 596, in apply return super().apply(*args, **kwargs) # type: ignore[misc] SystemError: <built-in method apply of FunctionMeta object at 0x55d67af92a10> returned NULL without setting an exception

Root Cause

Somehow the issue is caused by _record_memory_history

Code Example

import torch
from torch.distributed.algorithms._checkpoint.checkpoint_wrapper import checkpoint_wrapper

class Block(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.w = torch.nn.Linear(64, 64)

    def _epilogue(self, x, scores, indices, n):
        buf = torch.empty(n * 2, 64, dtype=x.dtype, device=x.device)
        buf[indices] = x
        return (buf.view(n, 2, 64) * scores.unsqueeze(-1)).sum(1).type_as(x)

    def forward(self, x):
        n = x.reshape(-1, 64).shape[0]
        h = self.w(x.reshape(-1, 64)).repeat(2, 1)
        scores = torch.ones(n, 2, device=x.device, dtype=x.dtype)
        idx = torch.arange(n * 2, device=x.device)
        return x + self._epilogue(h, scores, idx, n).view_as(x)

if __name__ == "__main__":
    torch.cuda.memory._record_memory_history(max_entries=100_000)
    with torch.device("cuda"):
        model = Block()
    model._epilogue = torch.compile(model._epilogue, fullgraph=True)
    model = checkpoint_wrapper(model)
    x = torch.randn((2, 64), device="cuda")
    model(x).sum().backward()

---

Traceback (most recent call last):
  File "/repro/repro_systemerror.py", line 28, in <module>
    model(x).sum().backward()
  File "/repro/.venv/lib/python3.10/site-packages/torch/_tensor.py", line 631, in backward
    torch.autograd.backward(
  File "/repro/.venv/lib/python3.10/site-packages/torch/autograd/__init__.py", line 381, in backward
    _engine_run_backward(
  File "/repro/.venv/lib/python3.10/site-packages/torch/autograd/graph.py", line 869, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/repro/.venv/lib/python3.10/site-packages/torch/autograd/function.py", line 317, in apply
    return user_fn(self, *args)
  File "/repro/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2681, in backward
    else ctx.saved_tensors
  File "/repro/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 1173, in unpack_hook
    _run_fn_with_dynamo_disabled(frame.recompute_fn, *args)
  File "/repro/.venv/lib/python3.10/site-packages/torch/_compile.py", line 54, in inner
    return disable_fn(*args, **kwargs)
  File "/repro/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1263, in _fn
    return fn(*args, **kwargs)
  File "/repro/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 1139, in _run_fn_with_dynamo_disabled
    return fn(*args, **kwargs)
  File "/repro/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 1607, in recompute_fn
    fn(*args, **kwargs)
  File "/repro/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/repro/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/repro/repro_systemerror.py", line 19, in forward
    return x + self._epilogue(h, scores, idx, n).view_as(x)
  File "/repro/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1024, in compile_wrapper
    return fn(*args, **kwargs)
  File "/repro/repro_systemerror.py", line 9, in _epilogue
    def _epilogue(self, x, scores, indices, n):
  File "/repro/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1263, in _fn
    return fn(*args, **kwargs)
  File "/repro/.venv/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1200, in forward
    return compiled_fn(full_args)
  File "/repro/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2298, in __call__
    return self.compiled_fn(*args, **kwargs)
  File "/repro/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 566, in runtime_wrapper
    all_outs = call_func_at_runtime_with_args(
  File "/repro/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 138, in call_func_at_runtime_with_args
    out = normalize_as_list(f(args))
  File "/repro/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 105, in g
    return f(*args)
  File "/repro/.venv/lib/python3.10/site-packages/torch/autograd/function.py", line 596, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
SystemError: <built-in method apply of FunctionMeta object at 0x55d67af92a10> returned NULL without setting an exception
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

The following minified repro used to work with PyTorch 2.10

import torch
from torch.distributed.algorithms._checkpoint.checkpoint_wrapper import checkpoint_wrapper

class Block(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.w = torch.nn.Linear(64, 64)

    def _epilogue(self, x, scores, indices, n):
        buf = torch.empty(n * 2, 64, dtype=x.dtype, device=x.device)
        buf[indices] = x
        return (buf.view(n, 2, 64) * scores.unsqueeze(-1)).sum(1).type_as(x)

    def forward(self, x):
        n = x.reshape(-1, 64).shape[0]
        h = self.w(x.reshape(-1, 64)).repeat(2, 1)
        scores = torch.ones(n, 2, device=x.device, dtype=x.dtype)
        idx = torch.arange(n * 2, device=x.device)
        return x + self._epilogue(h, scores, idx, n).view_as(x)

if __name__ == "__main__":
    torch.cuda.memory._record_memory_history(max_entries=100_000)
    with torch.device("cuda"):
        model = Block()
    model._epilogue = torch.compile(model._epilogue, fullgraph=True)
    model = checkpoint_wrapper(model)
    x = torch.randn((2, 64), device="cuda")
    model(x).sum().backward()

but started failing with PyTorch 2.11

Stacktrace

Traceback (most recent call last):
  File "/repro/repro_systemerror.py", line 28, in <module>
    model(x).sum().backward()
  File "/repro/.venv/lib/python3.10/site-packages/torch/_tensor.py", line 631, in backward
    torch.autograd.backward(
  File "/repro/.venv/lib/python3.10/site-packages/torch/autograd/__init__.py", line 381, in backward
    _engine_run_backward(
  File "/repro/.venv/lib/python3.10/site-packages/torch/autograd/graph.py", line 869, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/repro/.venv/lib/python3.10/site-packages/torch/autograd/function.py", line 317, in apply
    return user_fn(self, *args)
  File "/repro/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2681, in backward
    else ctx.saved_tensors
  File "/repro/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 1173, in unpack_hook
    _run_fn_with_dynamo_disabled(frame.recompute_fn, *args)
  File "/repro/.venv/lib/python3.10/site-packages/torch/_compile.py", line 54, in inner
    return disable_fn(*args, **kwargs)
  File "/repro/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1263, in _fn
    return fn(*args, **kwargs)
  File "/repro/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 1139, in _run_fn_with_dynamo_disabled
    return fn(*args, **kwargs)
  File "/repro/.venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 1607, in recompute_fn
    fn(*args, **kwargs)
  File "/repro/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/repro/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/repro/repro_systemerror.py", line 19, in forward
    return x + self._epilogue(h, scores, idx, n).view_as(x)
  File "/repro/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1024, in compile_wrapper
    return fn(*args, **kwargs)
  File "/repro/repro_systemerror.py", line 9, in _epilogue
    def _epilogue(self, x, scores, indices, n):
  File "/repro/.venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1263, in _fn
    return fn(*args, **kwargs)
  File "/repro/.venv/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1200, in forward
    return compiled_fn(full_args)
  File "/repro/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 2298, in __call__
    return self.compiled_fn(*args, **kwargs)
  File "/repro/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 566, in runtime_wrapper
    all_outs = call_func_at_runtime_with_args(
  File "/repro/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 138, in call_func_at_runtime_with_args
    out = normalize_as_list(f(args))
  File "/repro/.venv/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 105, in g
    return f(*args)
  File "/repro/.venv/lib/python3.10/site-packages/torch/autograd/function.py", line 596, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
SystemError: <built-in method apply of FunctionMeta object at 0x55d67af92a10> returned NULL without setting an exception

Somehow the issue is caused by _record_memory_history

Versions

PyTorch version: 2.11.0+cu130

cc @soulitzer @chauhang @penguinwu @bdhirsh @bobrenjc93 @aorenste

extent analysis

TL;DR

The most likely fix is to remove or modify the _record_memory_history call, as it seems to be causing the issue.

Guidance

  • Investigate the purpose of _record_memory_history in the code and consider removing it if it's not necessary.
  • Try commenting out the line torch.cuda.memory._record_memory_history(max_entries=100_000) to see if the issue persists.
  • If the issue is resolved after removing or modifying the _record_memory_history call, consider finding an alternative way to achieve the desired memory tracking functionality.
  • Verify that the issue is specific to PyTorch 2.11 by testing the code with PyTorch 2.10 to confirm the regression.

Example

No code example is provided as the issue seems to be related to a specific PyTorch function call rather than a code snippet that can be modified.

Notes

The issue may be specific to the combination of PyTorch 2.11 and the _record_memory_history call, and further investigation is needed to determine the root cause.

Recommendation

Apply workaround: Remove or modify the _record_memory_history call, as it seems to be the cause of the issue. This is recommended because the issue is likely related to a regression in PyTorch 2.11, and removing the problematic call may resolve the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING