pytorch - 💡(How to fix) Fix DISABLED test_replicate_with_kwargs (__main__.ReplicateFullyShardInit) [2 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#180265Fetched 2026-04-15 06:18:56
View on GitHub
Comments
2
Participants
1
Timeline
57
Reactions
0
Participants
Timeline (top)
mentioned ×26subscribed ×26labeled ×3commented ×2

Root Cause

This test was disabled because it is failing in CI. See recent examples and the most recent trunk workflow logs.

RAW_BUFFERClick to expand / collapse

Platforms: linux

This test was disabled because it is failing in CI. See recent examples and the most recent trunk workflow logs.

Over the past 6 hours, it has been determined flaky in 21 workflow(s) with 0 failures and 21 successes.

Debugging instructions (after clicking on the recent samples link): DO NOT ASSUME THINGS ARE OKAY IF THE CI IS GREEN. We now shield flaky tests from developers so CI will thus be green but it will be harder to parse the logs. To find relevant log snippets:

  1. Click on the workflow logs linked above
  2. Click on the Test step of the job so that it is expanded. Otherwise, the grepping will not work.
  3. Grep for test_replicate_with_kwargs
  4. There should be several instances run (as flaky tests are rerun in CI) from which you can study the logs.

Test file path: distributed/_composable/test_replicate.py

For all disabled tests (by GitHub issue), see https://hud.pytorch.org/disabled.

cc @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @aditvenk @xmfan @weifengpy

extent analysis

TL;DR

  • Enable and re-run the disabled test test_replicate_with_kwargs in the distributed/_composable/test_replicate.py file to investigate and potentially fix the flakiness.

Guidance

  • Investigate the workflow logs for the test by following the provided debugging instructions to identify patterns or clues that might explain the flakiness.
  • Study the logs from multiple instances of the test run, as flaky tests are re-run in CI, to see if there are any common factors or errors that could be contributing to the test's inconsistent behavior.
  • Review the test code in test_replicate.py to understand its logic and potential failure points, which might help in pinpointing the cause of the flakiness.

Notes

  • The issue of flakiness might be due to various factors including environmental conditions, test dependencies, or the test itself, so a thorough investigation is necessary.
  • The provided debugging instructions are crucial for finding relevant log snippets that can aid in diagnosing the issue.

Recommendation

  • Apply workaround: Temporarily enable the test and run it multiple times in different environments to gather more information about its behavior and potential failure causes, before attempting a permanent fix.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING