Inference should succeed on MPS devices. **Root cause:** `build_2d_sinusoidal_position_embedding` already accepts a `dtype` parameter (defaulting to `torch.float32`), and the caller at line 1077 correctly passes `dtype=hidden_states.dtype`. However, all internal tensor allocations hardcode `torch.float64`, ignoring the parameter entirely. Note: the same function body is the canonical source in `modeling_vit_mae.py` and gets inlined into the generated `modeling_rt_detr.py` and `modeling_rt_detr_v2.py`, so all three files (or the source + a regeneration) would need updating. Verified locally on Apple Silicon (MPS, transformers 5.9.0): crash without the fix, `logits shape: torch.Size([1, 300, 80])` with it.

transformers - 💡(How to fix) Fix [RT-DETRv2] MPS crash: build_2d_sinusoidal_position_embedding hardcodes torch.float64, breaking Apple Silicon / MPS inference

transformers2026-05-22 10:45:10

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

Root Cause

Root cause: build_2d_sinusoidal_position_embedding already accepts a dtype parameter (defaulting to torch.float32), and the caller at line 1077 correctly passes dtype=hidden_states.dtype. However, all internal tensor allocations hardcode torch.float64, ignoring the parameter entirely.

Code Example

- `transformers` version: 5.9.0
- Platform: macOS (Apple Silicon, MPS backend)
- Python version: 3.13
- PyTorch version: (MPS-enabled build)
- `docling` version: 2.95.0
- `docling-ibm-models` version: 3.13.2

---

import torch
from transformers import RTDetrV2ForObjectDetection, RTDetrImageProcessor
from PIL import Image
import requests

device = torch.device("mps")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

processor = RTDetrImageProcessor.from_pretrained("PekingU/rtdetr_v2_r18vd")
model = RTDetrV2ForObjectDetection.from_pretrained("PekingU/rtdetr_v2_r18vd").to(device)

inputs = processor(images=image, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)  # crashes here

---

TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

---

omega = torch.arange(pos_dim, dtype=torch.float64, device=device) / pos_dim

RAW_BUFFERClick to expand / collapse

System Info

- `transformers` version: 5.9.0
- Platform: macOS (Apple Silicon, MPS backend)
- Python version: 3.13
- PyTorch version: (MPS-enabled build)
- `docling` version: 2.95.0
- `docling-ibm-models` version: 3.13.2

Reproduction

Run RTDetrV2ForObjectDetection inference on any Apple Silicon Mac (MPS device). Triggered in practice via docling → docling-ibm-models → transformers, but reproducible with:

import torch
from transformers import RTDetrV2ForObjectDetection, RTDetrImageProcessor
from PIL import Image
import requests

device = torch.device("mps")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

processor = RTDetrImageProcessor.from_pretrained("PekingU/rtdetr_v2_r18vd")
model = RTDetrV2ForObjectDetection.from_pretrained("PekingU/rtdetr_v2_r18vd").to(device)

inputs = processor(images=image, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)  # crashes here

Error:

TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

Traceback points to modeling_rt_detr_v2.py line 988, inside build_2d_sinusoidal_position_embedding:

omega = torch.arange(pos_dim, dtype=torch.float64, device=device) / pos_dim

Expected behavior

Inference should succeed on MPS devices.

Note: the same function body is the canonical source in modeling_vit_mae.py and gets inlined into the generated modeling_rt_detr.py and modeling_rt_detr_v2.py, so all three files (or the source + a regeneration) would need updating.

Verified locally on Apple Silicon (MPS, transformers 5.9.0): crash without the fix, logits shape: torch.Size([1, 300, 80]) with it.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Inference should succeed on MPS devices.

Verified locally on Apple Silicon (MPS, transformers 5.9.0): crash without the fix, logits shape: torch.Size([1, 300, 80]) with it.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - 💡(How to fix) Fix [RT-DETRv2] MPS crash: build_2d_sinusoidal_position_embedding hardcodes torch.float64, breaking Apple Silicon / MPS inference

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

System Info

Reproduction

Expected behavior

FAQ

Expected behavior

Still need to ship something?

TRENDING