pytorch - 💡(How to fix) Fix Rolling out OSDC (ARC) runners on pull workflow for PyTorch trunk commits

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Mitigation

Opt-out pytorch bot to go back to EC2 runners, retry failed jobs to backfill signals

RAW_BUFFERClick to expand / collapse

Current Status

Ongoing

What's happening

We're starting a 1-week trial of running the pull workflow for trunk commits on OSDC (ARC) runners starting today (Apr 8th). Here's what you need to know:

Trunk commits will transparently run pull jobs on OSDC (ARC) runners instead of EC2. Pull jobs running on pull request remains unchanged. If something breaks, you can escalate to this issue and cc @seemethere @malfet @pytorch/pytorch-dev-infra @huydhn

Incident timeline (all times pacific)

1 week until Thursday 16th

User impact

No impact

Mitigation

Opt-out pytorch bot to go back to EC2 runners, retry failed jobs to backfill signals

extent analysis

TL;DR

The issue can be mitigated by opting out the PyTorch bot to use EC2 runners instead of OSDC (ARC) runners for trunk commits.

Guidance

  • If issues arise during the trial, escalate to this issue and notify the listed individuals (@seemethere, @malfet, @pytorch/pytorch-dev-infra, @huydhn) for assistance.
  • To mitigate problems, consider opting out the PyTorch bot to use EC2 runners, which can help identify if the issue is specific to OSDC (ARC) runners.
  • Retry failed jobs to backfill signals and assess if the issue persists.
  • Monitor the trial's progress until its conclusion on Thursday, 16th, to determine if the new setup is stable.

Notes

The provided information does not specify the exact nature of the potential issues that might arise, so it's essential to closely monitor the trial and be prepared to escalate any problems that occur.

Recommendation

Apply workaround: Opting out the PyTorch bot to use EC2 runners is a safer approach to mitigate potential issues during the trial period, allowing for a more controlled assessment of the new setup.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING