pytorch - 💡(How to fix) Fix DISABLED deit_tiny_patch16_224.fb_in1k fail_accuracy on ROCm dynamic_inductor_timm_training [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#181362Fetched 2026-04-25 06:02:57
View on GitHub
Comments
1
Participants
2
Timeline
125
Reactions
0
Timeline (top)
mentioned ×58subscribed ×58labeled ×7added_to_project_v2 ×1

deit_tiny_patch16_224.fb_in1k is failing accuracy checks in the ROCm dynamic_inductor_timm_training benchmark CI job. The model was previously passing but now reports fail_accuracy.

CI config

  • Config: rocm/dynamic_inductor_timm_training
  • Model: deit_tiny_patch16_224.fb_in1k
  • Expected: pass
  • Actual: fail_accuracy
  • Platform: ROCm

Sample failure

From CI job 72857140777:

deit_tiny_patch16_224.fb_in1k    FAIL:    accuracy=fail_accuracy, expected=pass

Notes

  • The model passes accuracy on CUDA and CPU configs.
  • This appears to be a ROCm-specific numerical accuracy regression.
  • The model was added to the benchmark suite in PR #165569 with pass expected on ROCm.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben

Root Cause

deit_tiny_patch16_224.fb_in1k is failing accuracy checks in the ROCm dynamic_inductor_timm_training benchmark CI job. The model was previously passing but now reports fail_accuracy.

CI config

  • Config: rocm/dynamic_inductor_timm_training
  • Model: deit_tiny_patch16_224.fb_in1k
  • Expected: pass
  • Actual: fail_accuracy
  • Platform: ROCm

Sample failure

From CI job 72857140777:

deit_tiny_patch16_224.fb_in1k    FAIL:    accuracy=fail_accuracy, expected=pass

Notes

  • The model passes accuracy on CUDA and CPU configs.
  • This appears to be a ROCm-specific numerical accuracy regression.
  • The model was added to the benchmark suite in PR #165569 with pass expected on ROCm.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben

Fix Action

Fix / Workaround

deit_tiny_patch16_224.fb_in1k is failing accuracy checks in the ROCm dynamic_inductor_timm_training benchmark CI job. The model was previously passing but now reports fail_accuracy.

CI config

  • Config: rocm/dynamic_inductor_timm_training
  • Model: deit_tiny_patch16_224.fb_in1k
  • Expected: pass
  • Actual: fail_accuracy
  • Platform: ROCm

Sample failure

From CI job 72857140777:

deit_tiny_patch16_224.fb_in1k    FAIL:    accuracy=fail_accuracy, expected=pass

Code Example

deit_tiny_patch16_224.fb_in1k    FAIL:    accuracy=fail_accuracy, expected=pass
RAW_BUFFERClick to expand / collapse

Summary

deit_tiny_patch16_224.fb_in1k is failing accuracy checks in the ROCm dynamic_inductor_timm_training benchmark CI job. The model was previously passing but now reports fail_accuracy.

CI config

  • Config: rocm/dynamic_inductor_timm_training
  • Model: deit_tiny_patch16_224.fb_in1k
  • Expected: pass
  • Actual: fail_accuracy
  • Platform: ROCm

Sample failure

From CI job 72857140777:

deit_tiny_patch16_224.fb_in1k    FAIL:    accuracy=fail_accuracy, expected=pass

Notes

  • The model passes accuracy on CUDA and CPU configs.
  • This appears to be a ROCm-specific numerical accuracy regression.
  • The model was added to the benchmark suite in PR #165569 with pass expected on ROCm.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben

extent analysis

TL;DR

Investigate and address the numerical accuracy regression specific to the ROCm platform for the deit_tiny_patch16_224.fb_in1k model.

Guidance

  • Review the changes made to the model or the ROCm configuration since the last successful run to identify potential causes of the numerical accuracy regression.
  • Compare the accuracy results on ROCm with those on CUDA and CPU to understand the discrepancy and potentially isolate the issue.
  • Consider re-running the benchmark with increased precision or different numerical settings on the ROCm platform to see if it mitigates the accuracy failure.
  • Investigate if there are any known issues or updates related to the ROCm version being used that could be contributing to the regression.

Notes

The issue seems to be specific to the ROCm platform, and the model passes accuracy checks on CUDA and CPU, indicating a potential platform-specific problem. Without more detailed information about the changes made or the specific ROCm version, it's challenging to provide a precise fix.

Recommendation

Apply workaround: Given the model's previous success on ROCm and its current success on other platforms, it's reasonable to suspect a temporary or configuration-related issue. Therefore, applying a workaround, such as adjusting numerical precision settings or updating the ROCm version if possible, might resolve the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING