transformers - ✅(Solved) Fix A little bug in testing_utils.py [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

I was trying to run integration tests on cpu for a new model addition PR then got hit by an error due to line 3220 was correctly set to the version and then the error occurred in line 3223

Root Cause

The reason is that I have CUDA installed on my system (lightning ai studio), but I did not have any GPU, so

IS_CUDA_SYSTEM = torch.version.cuda is not None

was correctly set to the version and then the error occurred in line 3223

major, minor = torch.cuda.get_device_capability()

as I did not have any gpu.

Fix Action

Fixed

PR fix notes

PR #45351: fix(testing_utils): guard get_device_capability with torch.cuda.is_available()

Description (problem / solution / changelog)

What does this PR do?

Fixes a crash in get_device_properties() in testing_utils.py when CUDA is installed on the system but no GPU device is present (e.g., a CPU-only cloud studio with CUDA libraries installed).

The function called torch.cuda.get_device_capability() immediately after checking IS_CUDA_SYSTEM (which is True whenever torch.version.cuda is not None), without first verifying that an actual GPU is available. On CUDA-installed but GPU-less systems, get_device_capability() raises an error.

Fixes #45341

Changes

  • src/transformers/testing_utils.py: Add if not torch.cuda.is_available(): return (torch_device, None, None) guard inside the IS_CUDA_SYSTEM or IS_ROCM_SYSTEM branch of get_device_properties(), before the get_device_capability() call.

Tests

This is a fix to the test infrastructure itself (testing_utils.py). The change prevents a crash that occurs in environments where IS_CUDA_SYSTEM=True but no physical GPU is present (e.g., running pytest on a CPU-only Lightning AI studio).

No new tests were added because the existing test suite runs in environments where torch.cuda.is_available() is True — the crash scenario only reproduces on CUDA-installed, no-GPU systems.

Note: This PR was developed with AI assistance (Claude Code). I have reviewed every line and understand the change. This is not a duplicate of any existing open PR (checked open PRs searching for issue 45341 in body and keyword searches for get_device_capability + is_available).

Changed files

  • src/transformers/testing_utils.py (modified, +2/-0)

Code Example

IS_CUDA_SYSTEM = torch.version.cuda is not None

---

major, minor = torch.cuda.get_device_capability()
RAW_BUFFERClick to expand / collapse

System Info

.

Who can help?

I was trying to run integration tests on cpu for a new model addition PR then got hit by an error due to line 3220

https://github.com/huggingface/transformers/blob/2fae57f5da9b0108d0c4da2692f5a702e6fb8c02/src/transformers/testing_utils.py#L3215-L3235

The reason is that I have CUDA installed on my system (lightning ai studio), but I did not have any GPU, so

IS_CUDA_SYSTEM = torch.version.cuda is not None

was correctly set to the version and then the error occurred in line 3223

major, minor = torch.cuda.get_device_capability()

as I did not have any gpu.

If this is not the intended behavior, the fix is simply changing line 3220 to if (IS_CUDA_SYSTEM or IS_ROCM_SYSTEM) and torch.cuda.is_available(): . In this case we will also need to remove the import torch inside, which anyway seems redundant to me.

@remi-or I see you worked on this area most recently (10 months ago), otherwise could you please ping the right person

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

.

Expected behavior

.

extent analysis

TL;DR

The most likely fix is to modify the condition in line 3220 to check for CUDA availability using torch.cuda.is_available().

Guidance

  • The error occurs because torch.cuda.get_device_capability() is called even when no GPU is available, despite CUDA being installed.
  • To fix this, the condition if (IS_CUDA_SYSTEM or IS_ROCM_SYSTEM) should be modified to also check for GPU availability using torch.cuda.is_available().
  • The redundant import torch statement inside the conditional block can be removed.
  • The modified condition should be if (IS_CUDA_SYSTEM or IS_ROCM_SYSTEM) and torch.cuda.is_available(): to ensure that CUDA-specific code is only executed when a GPU is available.

Example

if (IS_CUDA_SYSTEM or IS_ROCM_SYSTEM) and torch.cuda.is_available():
    major, minor = torch.cuda.get_device_capability()

Notes

This fix assumes that the intention is to only execute CUDA-specific code when a GPU is available, even if CUDA is installed.

Recommendation

Apply workaround: Modify the condition in line 3220 to include a check for torch.cuda.is_available() to prevent errors when no GPU is available.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING