pytorch - 💡(How to fix) Fix RFC Test class classification

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
RAW_BUFFERClick to expand / collapse

Following several discussions related to device generic testing and improving visibility into testing in general. I wanted to propose a more involved re-shuffling of our test harness. There are a few main goals here:

  • Improve the observability "in code" of what is running in which job. And reduce the complexity of the workflows and their scheduling by having more of the specification inline with each test.
  • Reduce "double testing" by making it hard to have a given test run in multiple configs.
  • Leverage our new capability to refactor code to improve the long term sustainability of the test suite.

In particular, this RFC proposes that all Test Class (such as here) gets a few properties:

  • Hardware requirement:
    • "generic": for generic tests that test shared cpu side logic. This will run on CPU-only runner and any test that can run on this should be placed here. Note that fakePG distributed tests should run here.
    • "device generic": for tests that check on-device behavior, generally aten op numerics, backend specific integration, etc. These tests will run on single accelerator runner (note that CPU is included here).
    • "device specific": for test that should run only for a given device.
    • "multi-device generic": for tests that check multi-device behavior, mostly related to distributed lower level implementation.
    • "multi-device specific": for device specific tests that require multiple accelerators.
  • Frequency:
    • "pull": for tests that should run on every commit pushed to any pull request.
    • "trunk": for tests that should run just before merging and on every commit in trunk
    • "periodic": for tests that should run in a periodic way, with inline specification of the frequency (from a set of acceptable ones). Note that nightly and RC validation should be covered by this one with daily or above frequencies.

The existing "slow test" should be moved to use the periodic system. And we should generally shard any of the jobs appropriately based on the workloads for each. We can also keep the existing memory or compute constraints we have on a per-test basis for now. The ciflow/ based triggers can also be kept as-is for now, as they mostly run tests from outside the test suite, or code that runs the above logic on a different hardware.

This will concretely require a significant refactoring of the tests, very similar to the one being worked on by the Accelerator Working Group from the TAC. It is actually a strict superset of the work there to separate into different classes "generic" vs "device generic" vs "device specific" such that "instantiate_device_type_tests()" can be used appropriately. This proposal will be adding class-level information on the result of the refactor done there. In particular, we will need to define these specifications as either a decorator or properties for test classes. Move a lot of the hardware requirement from a per-test to a per-class level. Add these new frequency properties and refactor run_test.py and the gh workflow to use this information to know what to run rather than bespoke logic. Note that we should also have a parallel effort improving the UX of periodic to ensure that using periodic tests can be done in a more reliable manner.

Curious to hear opinions from @jbschlosser @fffrog @jathu @malfet and the dev infra team in general here.

cc @malfet @pytorch/pytorch-dev-infra @mruberry

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix RFC Test class classification