pytorch - 💡(How to fix) Fix [MPS] 2.10.0 regression: Qwen2.5-VL / ColQwen2.5 produces NaN query embeddings on 7/8 batched queries (works on 2.9.0) [1 participants]

pytorch2026-05-06 21:58:36

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#182726•Fetched 2026-05-07 03:30:28

View on GitHub

Comments

Participants

Timeline

Reactions

Author

PaulLockett

Participants

PaulLockett

Timeline (top)

mentioned ×12subscribed ×12labeled ×8

On torch 2.10.0 with the MPS backend, batched query encoding through ColQwen2.5-v0.2 (a Qwen2.5-VL-derived model) produces NaN tensors for all but the first query in a batch. Pinning torch back to 2.9.0 with everything else identical (same model weights, same hardware, same Python env, same colpali-engine 0.3.15, same transformers 5.3.0, same inputs) restores correct, finite embeddings and full retrieval accuracy.

We are filing this so the community is aware of the regression while we don't yet have a reduced repro at the SDPA level — the model-level reproducer below pins the exact configuration that triggers it.

Error Message

This is one user reproducing on one machine (Apple M1 Max, macOS 26.4) on the day torch 2.10.0 hit PyPI. We don't claim every Apple Silicon user will see it. We're filing because the version bisect is clean (2.9.0 works, 2.10.0 broken with everything else unchanged) and the failure mode is silent NaN rather than a loud error.

Root Cause

Fix Action

Workaround

Pin torch==2.9.0. With colpali-engine 0.3.15 and transformers 5.3.0 this works correctly on MPS.

Code Example

import torch
from colpali_engine.models import ColQwen2_5, ColQwen2_5_Processor

device = "mps"
model = ColQwen2_5.from_pretrained(
    "vidore/colqwen2.5-v0.2",
    torch_dtype=torch.bfloat16,
    device_map=device,
).eval()
processor = ColQwen2_5_Processor.from_pretrained("vidore/colqwen2.5-v0.2")

queries = [
    "front cover with line engraving of a young Black girl with a hoop",
    "list of NAACP General Committee founding members and officers including Moorfield Storey and W E B Du Bois",
    "What to Read book reviews bibliography divided into Africa and America sections",
    "OPINION editorial column with quoted excerpts from newspapers like The Call and The World",
    "Christmas editorial titled the Burden",
    "Toussaint Conservatory of Music advertisement full page",
    "color line education statistics Houston County Georgia",
    "judicial decisions table on segregation cases",
]
proc = processor.process_queries(queries).to(device)
with torch.no_grad():
    q_emb = model(**proc).to(torch.float32).cpu()

print(q_emb.mean(dim=1).norm(dim=1))

---

tensor([0.5803,    nan,    nan,    nan,    nan,    nan,    nan,    nan])

RAW_BUFFERClick to expand / collapse

Description

Reproduction

import torch
from colpali_engine.models import ColQwen2_5, ColQwen2_5_Processor

device = "mps"
model = ColQwen2_5.from_pretrained(
    "vidore/colqwen2.5-v0.2",
    torch_dtype=torch.bfloat16,
    device_map=device,
).eval()
processor = ColQwen2_5_Processor.from_pretrained("vidore/colqwen2.5-v0.2")

queries = [
    "front cover with line engraving of a young Black girl with a hoop",
    "list of NAACP General Committee founding members and officers including Moorfield Storey and W E B Du Bois",
    "What to Read book reviews bibliography divided into Africa and America sections",
    "OPINION editorial column with quoted excerpts from newspapers like The Call and The World",
    "Christmas editorial titled the Burden",
    "Toussaint Conservatory of Music advertisement full page",
    "color line education statistics Houston County Georgia",
    "judicial decisions table on segregation cases",
]
proc = processor.process_queries(queries).to(device)
with torch.no_grad():
    q_emb = model(**proc).to(torch.float32).cpu()

print(q_emb.mean(dim=1).norm(dim=1))

Observed (torch 2.10.0, MPS, bfloat16)

tensor([0.5803,    nan,    nan,    nan,    nan,    nan,    nan,    nan])

Per-token inspection: q[0] has token norms ~1.0 throughout; q[1..7] have token norms == 0.000 (exact zero) for the first ~10 tokens, then NaN further into the sequence. Pairwise cosine similarity across queries on torch 2.10.0 is a NaN matrix.

End-to-end retrieval impact on a small benchmark (8 hand-curated retrieval queries against 97 magazine pages): R@5 drops from 1.000 → 0.125; R@1 drops from 0.875 → 0.000–0.125.

Expected (torch 2.9.0, MPS, bfloat16, otherwise identical env)

All 8 query embeddings finite, mean-pooled norms in roughly the 0.4–0.6 range, pairwise cosines in a healthy 0.07–0.55 spread, R@5 = 1.000.

Diagnostics — three controls run on torch 2.10.0

CPU instead of MPS (device="cpu", otherwise identical) — embeddings are finite, retrieval R@5 = 1.000. So the bug is MPS-specific in 2.10.0, not in the model code.
Disable mps-flash-attn / use stock SDPA — still NaN. So this is not the third-party flash-attn extension.
Force torch.nn.attention.sdpa_kernel(SDPBackend.MATH) — still NaN. So both fast and math SDPA paths on MPS produce NaN in this configuration.

We have not yet bisected to a single op or a minimal SDPA-level reproducer. The bug only manifests in the batched-query forward path through the Qwen2.5-VL text encoder; per-query forwards (batch=1) appear unaffected for the first item, which suggests a batch-dimension or cross-row interaction.

Workaround

Pin torch==2.9.0. With colpali-engine 0.3.15 and transformers 5.3.0 this works correctly on MPS.

Environment

torch: 2.10.0 (broken) / 2.9.0 (working) — both reproduced on the same machine, same day
transformers: 5.3.0
peft: 0.18.1
colpali-engine: 0.3.15
mps-flash-attn: installed (also reproduced with it disabled)
Python: 3.13.3
OS: macOS 26.4 (arm64)
Hardware: MacBook Pro, Apple M1 Max, 64 GB RAM
dtype: bfloat16
device: mps

Severity

We'd argue high. SDPA on MPS silently returning NaN with no warning means downstream RAG / retrieval / embedding pipelines on Apple Silicon get garbage results without any signal that anything is wrong — the model still runs, scores are just meaningless.

Suggested labels

module: mps, module: sdpa, regression

Honest scope

cc @kulinseth @malfet @DenisVieriu97 @jhavukainen @aditvenk @drisspg @liangel-02 @howardzhang-cv

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #memory management #API rate limit #retriever error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix [MPS] 2.10.0 regression: Qwen2.5-VL / ColQwen2.5 produces NaN query embeddings on 7/8 batched queries (works on 2.9.0) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Code Example

Description

Reproduction

Observed (torch 2.10.0, MPS, bfloat16)

Expected (torch 2.9.0, MPS, bfloat16, otherwise identical env)

Diagnostics — three controls run on torch 2.10.0

Workaround

Environment

Severity

Suggested labels

Honest scope

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix [MPS] 2.10.0 regression: Qwen2.5-VL / ColQwen2.5 produces NaN query embeddings on 7/8 batched queries (works on 2.9.0) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Code Example

Description

Reproduction

Observed (torch 2.10.0, MPS, bfloat16)

Expected (torch 2.9.0, MPS, bfloat16, otherwise identical env)

Diagnostics — three controls run on torch 2.10.0

Workaround

Environment

Severity

Suggested labels

Honest scope

Still need to ship something?

RELATED_DISCOVERY

TRENDING