transformers - ✅(Solved) Fix mps: test_eager_matches_sdpa_inference tests fail with PyTorch MPS backend [1 pull requests, 1 participants]

Q: Expected behavior

`test_eager_matches_sdpa_inference` should pass on MPS for the same model/dtype combinations it passes on CPU.

transformers2026-04-25 02:09:42

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45644•Fetched 2026-04-26 05:05:49

View on GitHub

Comments

Participants

Timeline

Reactions

Author

qflen

Participants

qflen

Timeline (top)

mentioned ×4subscribed ×4unsubscribed ×3cross-referenced ×1

Error Message

LlamaModelTest::test_eager_matches_sdpa_inference_00_fp16_pad_left_sdpa_kernels ValueError: mean relative difference for hidden_states: 2.962e-05, torch atol = 1e-07, torch rtol = 0.0001

Root Cause

In tests/test_modeling_common.py, lines 462-476:

if torch_device in ["cpu", "cuda"]:
    atol = atols[torch_device, enable_kernels, dtype]
    rtol = rtols[torch_device, enable_kernels, dtype]
elif torch_device in ["hpu", "npu"]:
    atol = atols["cuda", enable_kernels, dtype]
    rtol = rtols["cuda", enable_kernels, dtype]
elif torch_device == "xpu":
    atol = atols["cuda", False, dtype]
    rtol = rtols["cuda", False, dtype]
else:
    atol = 1e-7
    rtol = 1e-4

MPS falls through to atol=1e-7, rtol=1e-4, which is fp32-tight and impossible to satisfy for fp16. MPS only supports SDPBackend.MATH (same as XPU, no flash / mem_efficient kernels), so the correct dispatch is the same one used for XPU.

The same dispatch block is duplicated in tests/models/video_llama_3/test_modeling_video_llama_3.py:233-247 and has the same defect.

Fix Action

Fix / Workaround

The same dispatch block is duplicated in tests/models/video_llama_3/test_modeling_video_llama_3.py:233-247 and has the same defect.

Happy to open the PR. Patch is around 8 lines across the two files above, mirrors PR #34889 exactly, and I have an M5 to run the suite against. Please confirm you'd like the same dispatch (XPU-style MATH-only tolerances) for MPS rather than a separate MPS-specific entry.

PR fix notes

PR #45648: Fix SDPA inference tolerances for MPS backend

Repository: huggingface/transformers
Author: voodoovampire
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/45648

Description (problem / solution / changelog)

Fixes #45644

This PR adjusts test_eager_matches_sdpa_inference on the MPS backend by routing "mps" through the same tolerance branch as "xpu" in:

tests/test_modeling_common.py
tests/models/video_llama_3/test_modeling_video_llama_3.py

The previous default tolerances were too strict for fp16 on MPS and caused spurious test failures.

Changed files

tests/models/video_llama_3/test_modeling_video_llama_3.py (modified, +1/-1)
tests/test_modeling_common.py (modified, +1/-1)

Code Example

$ TRANSFORMERS_TEST_DEVICE=mps python -m pytest \
    tests/models/llama/test_modeling_llama.py \
    tests/models/gemma/test_modeling_gemma.py \
    tests/models/qwen2/test_modeling_qwen2.py \
    tests/models/mistral/test_modeling_mistral.py \
    -k "test_eager_matches_sdpa_inference and fp16"
...
======== 24 failed, 779 deselected, 25 warnings in 3.40s ========

---

LlamaModelTest::test_eager_matches_sdpa_inference_00_fp16_pad_left_sdpa_kernels
ValueError: mean relative difference for hidden_states: 2.962e-05,
            torch atol = 1e-07, torch rtol = 0.0001

---

if torch_device in ["cpu", "cuda"]:
    atol = atols[torch_device, enable_kernels, dtype]
    rtol = rtols[torch_device, enable_kernels, dtype]
elif torch_device in ["hpu", "npu"]:
    atol = atols["cuda", enable_kernels, dtype]
    rtol = rtols["cuda", enable_kernels, dtype]
elif torch_device == "xpu":
    atol = atols["cuda", False, dtype]
    rtol = rtols["cuda", False, dtype]
else:
    atol = 1e-7
    rtol = 1e-4

---

elif torch_device in ("xpu", "mps"):
    atol = atols["cuda", False, dtype]
    rtol = rtols["cuda", False, dtype]

RAW_BUFFERClick to expand / collapse

System Info

transformers version: 5.7.0.dev0 (main, c472755e79)
Platform: macOS-26.1-arm64, Apple M5, 32 GB
Python version: 3.12.13
PyTorch version: 2.11.0 (MPS available)

Who can help?

@Cyrilvallez (this is the MPS counterpart to the XPU branch you added).

Reproduction

Running test_eager_matches_sdpa_inference on the MPS backend fails for every fp16 parametrization across all SDPA-supporting models. Concretely:

$ TRANSFORMERS_TEST_DEVICE=mps python -m pytest \
    tests/models/llama/test_modeling_llama.py \
    tests/models/gemma/test_modeling_gemma.py \
    tests/models/qwen2/test_modeling_qwen2.py \
    tests/models/mistral/test_modeling_mistral.py \
    -k "test_eager_matches_sdpa_inference and fp16"
...
======== 24 failed, 779 deselected, 25 warnings in 3.40s ========

Failure pattern (representative):

LlamaModelTest::test_eager_matches_sdpa_inference_00_fp16_pad_left_sdpa_kernels
ValueError: mean relative difference for hidden_states: 2.962e-05,
            torch atol = 1e-07, torch rtol = 0.0001

The underlying numerics are correct: max_abs_diff ≈ 1.95e-3, mean_rel ≈ 9e-4, both well within the fp16 tolerance the test uses on CPU/CUDA. The same tests pass on CPU with the same model/seed/inputs.

Root cause

In tests/test_modeling_common.py, lines 462-476:

if torch_device in ["cpu", "cuda"]:
    atol = atols[torch_device, enable_kernels, dtype]
    rtol = rtols[torch_device, enable_kernels, dtype]
elif torch_device in ["hpu", "npu"]:
    atol = atols["cuda", enable_kernels, dtype]
    rtol = rtols["cuda", enable_kernels, dtype]
elif torch_device == "xpu":
    atol = atols["cuda", False, dtype]
    rtol = rtols["cuda", False, dtype]
else:
    atol = 1e-7
    rtol = 1e-4

The same dispatch block is duplicated in tests/models/video_llama_3/test_modeling_video_llama_3.py:233-247 and has the same defect.

Precedent

This is the MPS counterpart to #34888 / PR #34889, which fixed the identical issue for the XPU backend. The suggested fix is to add mps to the same branch as xpu:

elif torch_device in ("xpu", "mps"):
    atol = atols["cuda", False, dtype]
    rtol = rtols["cuda", False, dtype]

Verification

With that two-file change applied locally on M5 + torch 2.11.0:

Before fix, -k "test_eager_matches_sdpa_inference and fp16" on Llama+Gemma+Qwen2+Mistral: 24 failed.
After fix, -k "test_eager_matches_sdpa_inference" (all dtypes) on the same 4 models: 100 passed.
After fix, same test on Llama+Gemma2+Gemma3+Qwen3: 124 passed, 1 skipped.
After fix, test_modeling_video_llama_3.py on MPS: 24 passed, 8 skipped (fp32 output_attentions, unrelated).
CPU regression check (same tests, default device): unchanged, 16 passed.

Expected behavior

test_eager_matches_sdpa_inference should pass on MPS for the same model/dtype combinations it passes on CPU.

Offering to PR

extent analysis

TL;DR

The most likely fix is to update the tolerance dispatch for the MPS device to match the XPU device, allowing fp16 tests to pass.

Guidance

Identify the lines of code responsible for setting the tolerance values for different devices in tests/test_modeling_common.py and tests/models/video_llama_3/test_modeling_video_llama_3.py.
Update the elif block to include mps in the same branch as xpu, using the same tolerance values as the XPU device.
Verify the fix by running the affected tests on the MPS device and checking for passing results.
Consider opening a PR with the proposed fix, which should be a small patch (around 8 lines) across the two affected files.

Example

The proposed fix involves updating the following code block:

elif torch_device in ("xpu", "mps"):
    atol = atols["cuda", False, dtype]
    rtol = rtols["cuda", False, dtype]

This change should allow the fp16 tests to pass on the MPS device.

Notes

The fix is based on a precedent set by a previous PR (#34889) that fixed a similar issue for the XPU backend. The proposed change is a small, targeted patch that should not introduce any new issues.

Recommendation

Apply the workaround by updating the tolerance dispatch for the MPS device to match the XPU device. This fix is based on a precedent and has been verified to work locally.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

test_eager_matches_sdpa_inference should pass on MPS for the same model/dtype combinations it passes on CPU.

#retriever error #indexing error #inference speed #output truncation #response parsing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

transformers - ✅(Solved) Fix mps: test_eager_matches_sdpa_inference tests fail with PyTorch MPS backend [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #45648: Fix SDPA inference tolerances for MPS backend

Description (problem / solution / changelog)

Changed files

Code Example

System Info

Who can help?

Reproduction

Root cause

Precedent

Verification

Expected behavior

Offering to PR

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING