pytorch - 💡(How to fix) Fix DISABLED test_dynamo_dtensor_from_local_redistribute (__main__.TestDTensorCompileWithCompiledAutograd) [2 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#180656Fetched 2026-04-18 05:51:35
View on GitHub
Comments
2
Participants
1
Timeline
43
Reactions
0
Participants
Timeline (top)
mentioned ×18subscribed ×18labeled ×5commented ×2

Error Message

Traceback (most recent call last): File "/var/lib/jenkins/workspace/test/distributed/tensor/test_dtensor_compile.py", line 1444, in test_dynamo_dtensor_from_local_redistribute self.assertExpectedInline( File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/internal/common_utils.py", line 3391, in assertExpectedInline return super().assertExpectedInline(actual if isinstance(actual, str) else str(actual), expect, skip + 1) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/init.py", line 413, in assertExpectedInline assert_expected_inline( File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/init.py", line 378, in assert_expected_inline assert_eq(expect, actual, msg=help_text) File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/init.py", line 450, in assertMultiLineEqualMaybeCppStack self.assertMultiLineEqual(expect, actual, *args, **kwargs) File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 1226, in assertMultiLineEqual self.fail(self.formatMessage(msg, standardMsg)) File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 675, in fail raise self.failureException(msg) AssertionError: 'def [470 chars] add = to_local + 2; to_local = None\n return (add,)' != 'def [470 chars] add = to_local.add(2); to_local = None\n return (add,)' def forward(self, L_x : torch.Tensor, L_mesh : torch.distributed.device_mesh.DeviceMesh): l_x_ = L_x_ l_mesh_ = L_mesh_ dt = torch.distributed.tensor.api.from_local(l_x, l_mesh_, [torch.distributed.tensor.placement_types.Shard(dim=0)], run_check = False); l_x_ = None redistribute = dt.redistribute(l_mesh_, [torch.distributed.tensor.placement_types.Replicate()]); dt = l_mesh_ = None to_local = redistribute.to_local(); redistribute = None

  • add = to_local + 2;  to_local = None

? ^^^

  • add = to_local.add(2);  to_local = None

? ^^^^^ + return (add,) : To accept the new output, re-run test with envvar EXPECTTEST_ACCEPT=1 (we recommend staging/committing your changes before doing this)

To execute this test, run the following from the base repo dir: PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/distributed/tensor/test_dtensor_compile.py TestDTensorCompileWithCompiledAutograd.test_dynamo_dtensor_from_local_redistribute

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

Root Cause

This test was disabled because it is failing in CI. See recent examples and the most recent trunk workflow logs.

Code Example

Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/test/distributed/tensor/test_dtensor_compile.py", line 1444, in test_dynamo_dtensor_from_local_redistribute
    self.assertExpectedInline(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3391, in assertExpectedInline
    return super().assertExpectedInline(actual if isinstance(actual, str) else str(actual), expect, skip + 1)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 413, in assertExpectedInline
    assert_expected_inline(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 378, in assert_expected_inline
    assert_eq(expect, actual, msg=help_text)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 450, in assertMultiLineEqualMaybeCppStack
    self.assertMultiLineEqual(expect, actual, *args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 1226, in assertMultiLineEqual
    self.fail(self._formatMessage(msg, standardMsg))
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 675, in fail
    raise self.failureException(msg)
AssertionError: 'def [470 chars]    add = to_local + 2;  to_local = None\n    return (add,)' != 'def [470 chars]    add = to_local.add(2);  to_local = None\n    return (add,)'
  def forward(self, L_x_ : torch.Tensor, L_mesh_ : torch.distributed.device_mesh.DeviceMesh):
      l_x_ = L_x_
      l_mesh_ = L_mesh_
      dt = torch.distributed.tensor._api.from_local(l_x_, l_mesh_, [torch.distributed.tensor.placement_types.Shard(dim=0)], run_check = False);  l_x_ = None
      redistribute = dt.redistribute(l_mesh_, [torch.distributed.tensor.placement_types.Replicate()]);  dt = l_mesh_ = None
      to_local = redistribute.to_local();  redistribute = None
-     add = to_local + 2;  to_local = None
?                   ^^^
+     add = to_local.add(2);  to_local = None
?                   ^^^^^ +
      return (add,) : To accept the new output, re-run test with envvar EXPECTTEST_ACCEPT=1 (we recommend staging/committing your changes before doing this)

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/distributed/tensor/test_dtensor_compile.py TestDTensorCompileWithCompiledAutograd.test_dynamo_dtensor_from_local_redistribute

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
RAW_BUFFERClick to expand / collapse

Platforms: linux, slow

This test was disabled because it is failing in CI. See recent examples and the most recent trunk workflow logs.

Over the past 6 hours, it has been determined flaky in 56 workflow(s) with 112 failures and 56 successes.

Debugging instructions (after clicking on the recent samples link): DO NOT ASSUME THINGS ARE OKAY IF THE CI IS GREEN. We now shield flaky tests from developers so CI will thus be green but it will be harder to parse the logs. To find relevant log snippets:

  1. Click on the workflow logs linked above
  2. Click on the Test step of the job so that it is expanded. Otherwise, the grepping will not work.
  3. Grep for test_dynamo_dtensor_from_local_redistribute
  4. There should be several instances run (as flaky tests are rerun in CI) from which you can study the logs.
<details><summary>Sample error message</summary>
Traceback (most recent call last):
  File "/var/lib/jenkins/workspace/test/distributed/tensor/test_dtensor_compile.py", line 1444, in test_dynamo_dtensor_from_local_redistribute
    self.assertExpectedInline(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 3391, in assertExpectedInline
    return super().assertExpectedInline(actual if isinstance(actual, str) else str(actual), expect, skip + 1)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 413, in assertExpectedInline
    assert_expected_inline(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 378, in assert_expected_inline
    assert_eq(expect, actual, msg=help_text)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/expecttest/__init__.py", line 450, in assertMultiLineEqualMaybeCppStack
    self.assertMultiLineEqual(expect, actual, *args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 1226, in assertMultiLineEqual
    self.fail(self._formatMessage(msg, standardMsg))
  File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 675, in fail
    raise self.failureException(msg)
AssertionError: 'def [470 chars]    add = to_local + 2;  to_local = None\n    return (add,)' != 'def [470 chars]    add = to_local.add(2);  to_local = None\n    return (add,)'
  def forward(self, L_x_ : torch.Tensor, L_mesh_ : torch.distributed.device_mesh.DeviceMesh):
      l_x_ = L_x_
      l_mesh_ = L_mesh_
      dt = torch.distributed.tensor._api.from_local(l_x_, l_mesh_, [torch.distributed.tensor.placement_types.Shard(dim=0)], run_check = False);  l_x_ = None
      redistribute = dt.redistribute(l_mesh_, [torch.distributed.tensor.placement_types.Replicate()]);  dt = l_mesh_ = None
      to_local = redistribute.to_local();  redistribute = None
-     add = to_local + 2;  to_local = None
?                   ^^^
+     add = to_local.add(2);  to_local = None
?                   ^^^^^ +
      return (add,) : To accept the new output, re-run test with envvar EXPECTTEST_ACCEPT=1 (we recommend staging/committing your changes before doing this)

To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 PYTORCH_TEST_WITH_SLOW_GRADCHECK=1 python test/distributed/tensor/test_dtensor_compile.py TestDTensorCompileWithCompiledAutograd.test_dynamo_dtensor_from_local_redistribute

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
</details>

Test file path: inductor/test_compiled_autograd.py

For all disabled tests (by GitHub issue), see https://hud.pytorch.org/disabled.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

extent analysis

TL;DR

The most likely fix is to update the test case to use the correct method for adding a value to a tensor, which is add() instead of the + operator.

Guidance

  • Review the test case in test_dtensor_compile.py and update the line add = to_local + 2 to add = to_local.add(2) to fix the assertion error.
  • Run the test again with the updated code to verify that the fix works.
  • If the issue persists, check the workflow logs for any other errors or warnings that may be related to the test failure.
  • Consider re-running the test with the EXPECTTEST_ACCEPT=1 envvar to accept the new output and update the expected result.

Example

# Updated test case
def test_dynamo_dtensor_from_local_redistribute(self):
    # ...
    to_local = redistribute.to_local()
    add = to_local.add(2)  # Update this line
    return (add,)

Notes

The provided stacktrace and error message suggest that the issue is related to the use of the + operator instead of the add() method for adding a value to a tensor. Updating the test case to use the correct method should fix the assertion error.

Recommendation

Apply the workaround by updating the test case to use the add() method instead of the + operator, as this is the most likely cause of the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING