pytorch - ✅(Solved) Fix [Inductor][CI] Inductor Periodic CI has been broken for a long time [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#177084Fetched 2026-04-08 00:22:17
View on GitHub
Comments
1
Participants
2
Timeline
67
Reactions
0
Assignees
Timeline (top)
subscribed ×30mentioned ×20labeled ×7referenced ×3

Fix Action

Fixed

PR fix notes

PR #177695: [CI] Fix periodic inductor CI silently skipping all tests

Description (problem / solution / changelog)

Should fix https://github.com/pytorch/pytorch/issues/177084

The daily periodic schedule for inductor-unittest has been running without mem_leak_check mode for CUDA jobs. PR #161536 renamed the build job from cuda12.8-py3.10-gcc9-sm86 to inductor-build, which removed "cuda" from the job name. is_cuda_or_rocm_job() in filter_test_configs.py only checked the job name string, so it started returning False for the CUDA inductor job.

Without mem_leak_check, the only periodic mode was rerun_disabled_tests, which by design skips all non-disabled tests. Since few inductor tests are disabled, nearly all tests were skipped on every periodic run for ~6.5 months.

Changed files

  • .github/scripts/filter_test_configs.py (modified, +20/-8)
  • .github/scripts/test_filter_test_configs.py (modified, +19/-1)
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

<img width="1256" height="1335" alt="Image" src="https://github.com/user-attachments/assets/4143c0b6-9053-4f16-935b-6f51c9f38b47" />

We should fix this ASAP to get our health CI signals

CI link: https://hud.pytorch.org/hud/pytorch/pytorch/main/1?per_page=50

Versions

Nightly

cc @seemethere @malfet @pytorch/pytorch-dev-infra @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

extent analysis

Fix Plan

Fix Name: Health CI Signal Fix

Step 1: Identify the root cause of the issue

  • Review the CI link provided to understand the health signals that are failing.
  • Identify the specific health check that is causing the issue.

Step 2: Update the health check configuration

  • Update the health check configuration to use a more robust or accurate method for checking the health of the system.
  • For example, if the health check is currently using a simple HTTP request, consider using a more advanced method such as a WebSocket connection or a gRPC request.

Step 3: Implement the updated health check

  • Update the code that implements the health check to use the new configuration.
  • For example, if the health check is implemented using a Python script, update the script to use the new configuration.

Step 4: Test the updated health check

  • Test the updated health check to ensure that it is working correctly.
  • Verify that the health check is returning the expected results.

Example Code

import requests

def health_check():
    try:
        response = requests.get('https://example.com/health')
        if response.status_code == 200:
            return 'healthy'
        else:
            return 'unhealthy'
    except requests.RequestException as e:
        return 'unhealthy'

# Update the health check configuration to use a more robust method
def health_check_robust():
    try:
        response = requests.get('https://example.com/health', timeout=5)
        if response.status_code == 200:
            return 'healthy'
        else:
            return 'unhealthy'
    except requests.RequestException as e:
        return 'unhealthy'

Verification

  • Verify that the health check is returning the expected results.
  • Check the CI link to ensure that the health signals are no longer failing.

Extra Tips

  • Consider implementing additional health checks to ensure that the system is healthy

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - ✅(Solved) Fix [Inductor][CI] Inductor Periodic CI has been broken for a long time [2 pull requests, 1 comments, 2 participants]