transformers - ✅(Solved) Fix mps: test_eager_matches_sdpa_inference tests fail with PyTorch MPS backend [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#45644Fetched 2026-04-26 05:05:49
View on GitHub
Comments
0
Participants
1
Timeline
12
Reactions
0
Author
Participants
Timeline (top)
mentioned ×4subscribed ×4unsubscribed ×3cross-referenced ×1

Error Message

LlamaModelTest::test_eager_matches_sdpa_inference_00_fp16_pad_left_sdpa_kernels ValueError: mean relative difference for hidden_states: 2.962e-05, torch atol = 1e-07, torch rtol = 0.0001

Root Cause

In tests/test_modeling_common.py, lines 462-476:

if torch_device in ["cpu", "cuda"]:
    atol = atols[torch_device, enable_kernels, dtype]
    rtol = rtols[torch_device, enable_kernels, dtype]
elif torch_device in ["hpu", "npu"]:
    atol = atols["cuda", enable_kernels, dtype]
    rtol = rtols["cuda", enable_kernels, dtype]
elif torch_device == "xpu":
    atol = atols["cuda", False, dtype]
    rtol = rtols["cuda", False, dtype]
else:
    atol = 1e-7
    rtol = 1e-4

MPS falls through to atol=1e-7, rtol=1e-4, which is fp32-tight and impossible to satisfy for fp16. MPS only supports SDPBackend.MATH (same as XPU, no flash / mem_efficient kernels), so the correct dispatch is the same one used for XPU.

The same dispatch block is duplicated in tests/models/video_llama_3/test_modeling_video_llama_3.py:233-247 and has the same defect.

Fix Action

Fix / Workaround

MPS falls through to atol=1e-7, rtol=1e-4, which is fp32-tight and impossible to satisfy for fp16. MPS only supports SDPBackend.MATH (same as XPU, no flash / mem_efficient kernels), so the correct dispatch is the same one used for XPU.

The same dispatch block is duplicated in tests/models/video_llama_3/test_modeling_video_llama_3.py:233-247 and has the same defect.

Happy to open the PR. Patch is around 8 lines across the two files above, mirrors PR #34889 exactly, and I have an M5 to run the suite against. Please confirm you'd like the same dispatch (XPU-style MATH-only tolerances) for MPS rather than a separate MPS-specific entry.

PR fix notes

PR #45648: Fix SDPA inference tolerances for MPS backend

Description (problem / solution / changelog)

Fixes #45644

This PR adjusts test_eager_matches_sdpa_inference on the MPS backend by routing "mps" through the same tolerance branch as "xpu" in:

  • tests/test_modeling_common.py
  • tests/models/video_llama_3/test_modeling_video_llama_3.py

The previous default tolerances were too strict for fp16 on MPS and caused spurious test failures.

Changed files

  • tests/models/video_llama_3/test_modeling_video_llama_3.py (modified, +1/-1)
  • tests/test_modeling_common.py (modified, +1/-1)

Code Example

$ TRANSFORMERS_TEST_DEVICE=mps python -m pytest \
    tests/models/llama/test_modeling_llama.py \
    tests/models/gemma/test_modeling_gemma.py \
    tests/models/qwen2/test_modeling_qwen2.py \
    tests/models/mistral/test_modeling_mistral.py \
    -k "test_eager_matches_sdpa_inference and fp16"
...
======== 24 failed, 779 deselected, 25 warnings in 3.40s ========

---

LlamaModelTest::test_eager_matches_sdpa_inference_00_fp16_pad_left_sdpa_kernels
ValueError: mean relative difference for hidden_states: 2.962e-05,
            torch atol = 1e-07, torch rtol = 0.0001

---

if torch_device in ["cpu", "cuda"]:
    atol = atols[torch_device, enable_kernels, dtype]
    rtol = rtols[torch_device, enable_kernels, dtype]
elif torch_device in ["hpu", "npu"]:
    atol = atols["cuda", enable_kernels, dtype]
    rtol = rtols["cuda", enable_kernels, dtype]
elif torch_device == "xpu":
    atol = atols["cuda", False, dtype]
    rtol = rtols["cuda", False, dtype]
else:
    atol = 1e-7
    rtol = 1e-4

---

elif torch_device in ("xpu", "mps"):
    atol = atols["cuda", False, dtype]
    rtol = rtols["cuda", False, dtype]
RAW_BUFFERClick to expand / collapse

System Info

  • transformers version: 5.7.0.dev0 (main, c472755e79)
  • Platform: macOS-26.1-arm64, Apple M5, 32 GB
  • Python version: 3.12.13
  • PyTorch version: 2.11.0 (MPS available)

Who can help?

@Cyrilvallez (this is the MPS counterpart to the XPU branch you added).

Reproduction

Running test_eager_matches_sdpa_inference on the MPS backend fails for every fp16 parametrization across all SDPA-supporting models. Concretely:

$ TRANSFORMERS_TEST_DEVICE=mps python -m pytest \
    tests/models/llama/test_modeling_llama.py \
    tests/models/gemma/test_modeling_gemma.py \
    tests/models/qwen2/test_modeling_qwen2.py \
    tests/models/mistral/test_modeling_mistral.py \
    -k "test_eager_matches_sdpa_inference and fp16"
...
======== 24 failed, 779 deselected, 25 warnings in 3.40s ========

Failure pattern (representative):

LlamaModelTest::test_eager_matches_sdpa_inference_00_fp16_pad_left_sdpa_kernels
ValueError: mean relative difference for hidden_states: 2.962e-05,
            torch atol = 1e-07, torch rtol = 0.0001

The underlying numerics are correct: max_abs_diff ≈ 1.95e-3, mean_rel ≈ 9e-4, both well within the fp16 tolerance the test uses on CPU/CUDA. The same tests pass on CPU with the same model/seed/inputs.

Root cause

In tests/test_modeling_common.py, lines 462-476:

if torch_device in ["cpu", "cuda"]:
    atol = atols[torch_device, enable_kernels, dtype]
    rtol = rtols[torch_device, enable_kernels, dtype]
elif torch_device in ["hpu", "npu"]:
    atol = atols["cuda", enable_kernels, dtype]
    rtol = rtols["cuda", enable_kernels, dtype]
elif torch_device == "xpu":
    atol = atols["cuda", False, dtype]
    rtol = rtols["cuda", False, dtype]
else:
    atol = 1e-7
    rtol = 1e-4

MPS falls through to atol=1e-7, rtol=1e-4, which is fp32-tight and impossible to satisfy for fp16. MPS only supports SDPBackend.MATH (same as XPU, no flash / mem_efficient kernels), so the correct dispatch is the same one used for XPU.

The same dispatch block is duplicated in tests/models/video_llama_3/test_modeling_video_llama_3.py:233-247 and has the same defect.

Precedent

This is the MPS counterpart to #34888 / PR #34889, which fixed the identical issue for the XPU backend. The suggested fix is to add mps to the same branch as xpu:

elif torch_device in ("xpu", "mps"):
    atol = atols["cuda", False, dtype]
    rtol = rtols["cuda", False, dtype]

Verification

With that two-file change applied locally on M5 + torch 2.11.0:

  • Before fix, -k "test_eager_matches_sdpa_inference and fp16" on Llama+Gemma+Qwen2+Mistral: 24 failed.
  • After fix, -k "test_eager_matches_sdpa_inference" (all dtypes) on the same 4 models: 100 passed.
  • After fix, same test on Llama+Gemma2+Gemma3+Qwen3: 124 passed, 1 skipped.
  • After fix, test_modeling_video_llama_3.py on MPS: 24 passed, 8 skipped (fp32 output_attentions, unrelated).
  • CPU regression check (same tests, default device): unchanged, 16 passed.

Expected behavior

test_eager_matches_sdpa_inference should pass on MPS for the same model/dtype combinations it passes on CPU.

Offering to PR

Happy to open the PR. Patch is around 8 lines across the two files above, mirrors PR #34889 exactly, and I have an M5 to run the suite against. Please confirm you'd like the same dispatch (XPU-style MATH-only tolerances) for MPS rather than a separate MPS-specific entry.

extent analysis

TL;DR

The most likely fix is to update the tolerance dispatch for the MPS device to match the XPU device, allowing fp16 tests to pass.

Guidance

  • Identify the lines of code responsible for setting the tolerance values for different devices in tests/test_modeling_common.py and tests/models/video_llama_3/test_modeling_video_llama_3.py.
  • Update the elif block to include mps in the same branch as xpu, using the same tolerance values as the XPU device.
  • Verify the fix by running the affected tests on the MPS device and checking for passing results.
  • Consider opening a PR with the proposed fix, which should be a small patch (around 8 lines) across the two affected files.

Example

The proposed fix involves updating the following code block:

elif torch_device in ("xpu", "mps"):
    atol = atols["cuda", False, dtype]
    rtol = rtols["cuda", False, dtype]

This change should allow the fp16 tests to pass on the MPS device.

Notes

The fix is based on a precedent set by a previous PR (#34889) that fixed a similar issue for the XPU backend. The proposed change is a small, targeted patch that should not introduce any new issues.

Recommendation

Apply the workaround by updating the tolerance dispatch for the MPS device to match the XPU device. This fix is based on a precedent and has been verified to work locally.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

test_eager_matches_sdpa_inference should pass on MPS for the same model/dtype combinations it passes on CPU.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - ✅(Solved) Fix mps: test_eager_matches_sdpa_inference tests fail with PyTorch MPS backend [1 pull requests, 1 participants]