transformers - 💡(How to fix) Fix [RT-DETRv2] MPS crash: build_2d_sinusoidal_position_embedding hardcodes torch.float64, breaking Apple Silicon / MPS inference

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

Root Cause

Root cause: build_2d_sinusoidal_position_embedding already accepts a dtype parameter (defaulting to torch.float32), and the caller at line 1077 correctly passes dtype=hidden_states.dtype. However, all internal tensor allocations hardcode torch.float64, ignoring the parameter entirely.

Code Example

- `transformers` version: 5.9.0
- Platform: macOS (Apple Silicon, MPS backend)
- Python version: 3.13
- PyTorch version: (MPS-enabled build)
- `docling` version: 2.95.0
- `docling-ibm-models` version: 3.13.2

---

import torch
from transformers import RTDetrV2ForObjectDetection, RTDetrImageProcessor
from PIL import Image
import requests

device = torch.device("mps")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

processor = RTDetrImageProcessor.from_pretrained("PekingU/rtdetr_v2_r18vd")
model = RTDetrV2ForObjectDetection.from_pretrained("PekingU/rtdetr_v2_r18vd").to(device)

inputs = processor(images=image, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)  # crashes here

---

TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

---

omega = torch.arange(pos_dim, dtype=torch.float64, device=device) / pos_dim
RAW_BUFFERClick to expand / collapse

System Info

- `transformers` version: 5.9.0
- Platform: macOS (Apple Silicon, MPS backend)
- Python version: 3.13
- PyTorch version: (MPS-enabled build)
- `docling` version: 2.95.0
- `docling-ibm-models` version: 3.13.2

Reproduction

Run RTDetrV2ForObjectDetection inference on any Apple Silicon Mac (MPS device). Triggered in practice via doclingdocling-ibm-modelstransformers, but reproducible with:

import torch
from transformers import RTDetrV2ForObjectDetection, RTDetrImageProcessor
from PIL import Image
import requests

device = torch.device("mps")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

processor = RTDetrImageProcessor.from_pretrained("PekingU/rtdetr_v2_r18vd")
model = RTDetrV2ForObjectDetection.from_pretrained("PekingU/rtdetr_v2_r18vd").to(device)

inputs = processor(images=image, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)  # crashes here

Error:

TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

Traceback points to modeling_rt_detr_v2.py line 988, inside build_2d_sinusoidal_position_embedding:

omega = torch.arange(pos_dim, dtype=torch.float64, device=device) / pos_dim

Expected behavior

Inference should succeed on MPS devices.

Root cause: build_2d_sinusoidal_position_embedding already accepts a dtype parameter (defaulting to torch.float32), and the caller at line 1077 correctly passes dtype=hidden_states.dtype. However, all internal tensor allocations hardcode torch.float64, ignoring the parameter entirely.

Note: the same function body is the canonical source in modeling_vit_mae.py and gets inlined into the generated modeling_rt_detr.py and modeling_rt_detr_v2.py, so all three files (or the source + a regeneration) would need updating.

Verified locally on Apple Silicon (MPS, transformers 5.9.0): crash without the fix, logits shape: torch.Size([1, 300, 80]) with it.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Inference should succeed on MPS devices.

Root cause: build_2d_sinusoidal_position_embedding already accepts a dtype parameter (defaulting to torch.float32), and the caller at line 1077 correctly passes dtype=hidden_states.dtype. However, all internal tensor allocations hardcode torch.float64, ignoring the parameter entirely.

Note: the same function body is the canonical source in modeling_vit_mae.py and gets inlined into the generated modeling_rt_detr.py and modeling_rt_detr_v2.py, so all three files (or the source + a regeneration) would need updating.

Verified locally on Apple Silicon (MPS, transformers 5.9.0): crash without the fix, logits shape: torch.Size([1, 300, 80]) with it.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING