pytorch - 💡(How to fix) Fix [MPS] 2.10.0 regression: Qwen2.5-VL / ColQwen2.5 produces NaN query embeddings on 7/8 batched queries (works on 2.9.0) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#182726Fetched 2026-05-07 03:30:28
View on GitHub
Comments
0
Participants
1
Timeline
32
Reactions
0
Participants
Timeline (top)
mentioned ×12subscribed ×12labeled ×8

On torch 2.10.0 with the MPS backend, batched query encoding through ColQwen2.5-v0.2 (a Qwen2.5-VL-derived model) produces NaN tensors for all but the first query in a batch. Pinning torch back to 2.9.0 with everything else identical (same model weights, same hardware, same Python env, same colpali-engine 0.3.15, same transformers 5.3.0, same inputs) restores correct, finite embeddings and full retrieval accuracy.

We are filing this so the community is aware of the regression while we don't yet have a reduced repro at the SDPA level — the model-level reproducer below pins the exact configuration that triggers it.

Error Message

This is one user reproducing on one machine (Apple M1 Max, macOS 26.4) on the day torch 2.10.0 hit PyPI. We don't claim every Apple Silicon user will see it. We're filing because the version bisect is clean (2.9.0 works, 2.10.0 broken with everything else unchanged) and the failure mode is silent NaN rather than a loud error.

Root Cause

This is one user reproducing on one machine (Apple M1 Max, macOS 26.4) on the day torch 2.10.0 hit PyPI. We don't claim every Apple Silicon user will see it. We're filing because the version bisect is clean (2.9.0 works, 2.10.0 broken with everything else unchanged) and the failure mode is silent NaN rather than a loud error.

Fix Action

Workaround

Pin torch==2.9.0. With colpali-engine 0.3.15 and transformers 5.3.0 this works correctly on MPS.

Code Example

import torch
from colpali_engine.models import ColQwen2_5, ColQwen2_5_Processor

device = "mps"
model = ColQwen2_5.from_pretrained(
    "vidore/colqwen2.5-v0.2",
    torch_dtype=torch.bfloat16,
    device_map=device,
).eval()
processor = ColQwen2_5_Processor.from_pretrained("vidore/colqwen2.5-v0.2")

queries = [
    "front cover with line engraving of a young Black girl with a hoop",
    "list of NAACP General Committee founding members and officers including Moorfield Storey and W E B Du Bois",
    "What to Read book reviews bibliography divided into Africa and America sections",
    "OPINION editorial column with quoted excerpts from newspapers like The Call and The World",
    "Christmas editorial titled the Burden",
    "Toussaint Conservatory of Music advertisement full page",
    "color line education statistics Houston County Georgia",
    "judicial decisions table on segregation cases",
]
proc = processor.process_queries(queries).to(device)
with torch.no_grad():
    q_emb = model(**proc).to(torch.float32).cpu()

print(q_emb.mean(dim=1).norm(dim=1))

---

tensor([0.5803,    nan,    nan,    nan,    nan,    nan,    nan,    nan])
RAW_BUFFERClick to expand / collapse

Description

On torch 2.10.0 with the MPS backend, batched query encoding through ColQwen2.5-v0.2 (a Qwen2.5-VL-derived model) produces NaN tensors for all but the first query in a batch. Pinning torch back to 2.9.0 with everything else identical (same model weights, same hardware, same Python env, same colpali-engine 0.3.15, same transformers 5.3.0, same inputs) restores correct, finite embeddings and full retrieval accuracy.

We are filing this so the community is aware of the regression while we don't yet have a reduced repro at the SDPA level — the model-level reproducer below pins the exact configuration that triggers it.

Reproduction

import torch
from colpali_engine.models import ColQwen2_5, ColQwen2_5_Processor

device = "mps"
model = ColQwen2_5.from_pretrained(
    "vidore/colqwen2.5-v0.2",
    torch_dtype=torch.bfloat16,
    device_map=device,
).eval()
processor = ColQwen2_5_Processor.from_pretrained("vidore/colqwen2.5-v0.2")

queries = [
    "front cover with line engraving of a young Black girl with a hoop",
    "list of NAACP General Committee founding members and officers including Moorfield Storey and W E B Du Bois",
    "What to Read book reviews bibliography divided into Africa and America sections",
    "OPINION editorial column with quoted excerpts from newspapers like The Call and The World",
    "Christmas editorial titled the Burden",
    "Toussaint Conservatory of Music advertisement full page",
    "color line education statistics Houston County Georgia",
    "judicial decisions table on segregation cases",
]
proc = processor.process_queries(queries).to(device)
with torch.no_grad():
    q_emb = model(**proc).to(torch.float32).cpu()

print(q_emb.mean(dim=1).norm(dim=1))

Observed (torch 2.10.0, MPS, bfloat16)

tensor([0.5803,    nan,    nan,    nan,    nan,    nan,    nan,    nan])

Per-token inspection: q[0] has token norms ~1.0 throughout; q[1..7] have token norms == 0.000 (exact zero) for the first ~10 tokens, then NaN further into the sequence. Pairwise cosine similarity across queries on torch 2.10.0 is a NaN matrix.

End-to-end retrieval impact on a small benchmark (8 hand-curated retrieval queries against 97 magazine pages): R@5 drops from 1.000 → 0.125; R@1 drops from 0.875 → 0.000–0.125.

Expected (torch 2.9.0, MPS, bfloat16, otherwise identical env)

All 8 query embeddings finite, mean-pooled norms in roughly the 0.4–0.6 range, pairwise cosines in a healthy 0.07–0.55 spread, R@5 = 1.000.

Diagnostics — three controls run on torch 2.10.0

  1. CPU instead of MPS (device="cpu", otherwise identical) — embeddings are finite, retrieval R@5 = 1.000. So the bug is MPS-specific in 2.10.0, not in the model code.
  2. Disable mps-flash-attn / use stock SDPA — still NaN. So this is not the third-party flash-attn extension.
  3. Force torch.nn.attention.sdpa_kernel(SDPBackend.MATH) — still NaN. So both fast and math SDPA paths on MPS produce NaN in this configuration.

We have not yet bisected to a single op or a minimal SDPA-level reproducer. The bug only manifests in the batched-query forward path through the Qwen2.5-VL text encoder; per-query forwards (batch=1) appear unaffected for the first item, which suggests a batch-dimension or cross-row interaction.

Workaround

Pin torch==2.9.0. With colpali-engine 0.3.15 and transformers 5.3.0 this works correctly on MPS.

Environment

  • torch: 2.10.0 (broken) / 2.9.0 (working) — both reproduced on the same machine, same day
  • transformers: 5.3.0
  • peft: 0.18.1
  • colpali-engine: 0.3.15
  • mps-flash-attn: installed (also reproduced with it disabled)
  • Python: 3.13.3
  • OS: macOS 26.4 (arm64)
  • Hardware: MacBook Pro, Apple M1 Max, 64 GB RAM
  • dtype: bfloat16
  • device: mps

Severity

We'd argue high. SDPA on MPS silently returning NaN with no warning means downstream RAG / retrieval / embedding pipelines on Apple Silicon get garbage results without any signal that anything is wrong — the model still runs, scores are just meaningless.

Suggested labels

module: mps, module: sdpa, regression

Honest scope

This is one user reproducing on one machine (Apple M1 Max, macOS 26.4) on the day torch 2.10.0 hit PyPI. We don't claim every Apple Silicon user will see it. We're filing because the version bisect is clean (2.9.0 works, 2.10.0 broken with everything else unchanged) and the failure mode is silent NaN rather than a loud error.

cc @kulinseth @malfet @DenisVieriu97 @jhavukainen @aditvenk @drisspg @liangel-02 @howardzhang-cv

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix [MPS] 2.10.0 regression: Qwen2.5-VL / ColQwen2.5 produces NaN query embeddings on 7/8 batched queries (works on 2.9.0) [1 participants]