pytorch - 💡(How to fix) Fix MPS: `add_dense_scalar_cast_float` shader fails with "Read-only bytes bound to shader argument with write access enabled" assertion during pyannote.audio inference (M4 Pro, torch 2.11) [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#181650Fetched 2026-04-28 06:24:01
View on GitHub
Comments
1
Participants
2
Timeline
81
Reactions
0
Participants
Timeline (top)
mentioned ×36subscribed ×36labeled ×6unlabeled ×2

Running pyannote-audio 4.0.4 with the pyannote/speaker-diarization-3.1 model on Apple Silicon (M4 Pro, macOS 26.1) crashes mid-inference when the pipeline is moved to MPS. The Metal validation layer aborts with:

Compute Function(add_dense_scalar_cast_float):
  Read-only bytes are being bound at index N to a shader argument with write access enabled

The same pipeline produces correct output on CPU on the same machine.

Root Cause

Running pyannote-audio 4.0.4 with the pyannote/speaker-diarization-3.1 model on Apple Silicon (M4 Pro, macOS 26.1) crashes mid-inference when the pipeline is moved to MPS. The Metal validation layer aborts with:

Compute Function(add_dense_scalar_cast_float):
  Read-only bytes are being bound at index N to a shader argument with write access enabled

The same pipeline produces correct output on CPU on the same machine.

Fix Action

Fix / Workaround

Workarounds tried

Happy to test patches.

Code Example

Compute Function(add_dense_scalar_cast_float):
  Read-only bytes are being bound at index N to a shader argument with write access enabled

---

import torch
from pyannote.audio import Pipeline

p = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", token="…")
p.to(torch.device("mps"))
out = p("audio_16k_mono.wav", num_speakers=4)  # → shader assertion here
RAW_BUFFERClick to expand / collapse

Summary

Running pyannote-audio 4.0.4 with the pyannote/speaker-diarization-3.1 model on Apple Silicon (M4 Pro, macOS 26.1) crashes mid-inference when the pipeline is moved to MPS. The Metal validation layer aborts with:

Compute Function(add_dense_scalar_cast_float):
  Read-only bytes are being bound at index N to a shader argument with write access enabled

The same pipeline produces correct output on CPU on the same machine.

Repro

I do not yet have a torch-only minimal repro — the failure surfaces inside pyannote's embedding-extraction step and the offending tensor shape/dtype isn't exposed in the assertion. Happy to dig deeper with guidance.

Pipeline-level repro:

import torch
from pyannote.audio import Pipeline

p = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", token="…")
p.to(torch.device("mps"))
out = p("audio_16k_mono.wav", num_speakers=4)  # → shader assertion here

Environment

  • Hardware: Mac16,7 · Apple M4 Pro · arm64
  • OS: macOS 26.1 (build 25B78)
  • Python: 3.13.13
  • PyTorch: 2.11.0
  • pyannote-audio: 4.0.4 (model: speaker-diarization-3.1)
  • torch.backends.mps.is_available()True
  • torch.backends.mps.is_built()True

Workarounds tried

  • PYTORCH_ENABLE_MPS_FALLBACK=1 — does not help; pipeline still crashes on the same op.
  • CPU works but is ~5-10× slower than expected MPS throughput on this hardware.

Related reports of the same shader bug class

A user on the Hugging Face forum reports the identical assertion class on a sibling shader name sub_dense_scalar_long_long, suggesting a shared bug in the *_dense_scalar_* MPS kernel family rather than something specific to pyannote:

pyannote-side acknowledgements that MPS is currently unreliable, all pointing back to the PyTorch MPS backend:

Asks

  1. Is add_dense_scalar_cast_float tracked anywhere I missed?
  2. Is the *_dense_scalar_* MPS kernel family a known broken family, or are individual ops tracked separately?
  3. Any guidance on reducing this to a torch-only minimal repro from just the assertion text?

Happy to test patches.

cc @snadampal @milpuz01 @aditew01 @nikhil-arm @fadara01 @robert-hardwick @nWEIdia @kulinseth @malfet @DenisVieriu97 @jhavukainen @aditvenk

extent analysis

TL;DR

The most likely fix is to avoid using the Metal Performance Shaders (MPS) backend due to known issues with the *_dense_scalar_* kernel family.

Guidance

  • Try running the pipeline on the CPU to verify that the issue is indeed related to the MPS backend.
  • Investigate the PyTorch MPS backend issues and consider filing a bug report or waiting for a fix, as the *_dense_scalar_* kernel family seems to be problematic.
  • To reduce the issue to a torch-only minimal repro, try isolating the specific tensor operation that causes the crash and reproducing it with a simple PyTorch script.

Example

No code example is provided, as the issue is related to a specific hardware and backend configuration.

Notes

The issue seems to be related to a known problem with the PyTorch MPS backend, and the pyannote-audio maintainers have acknowledged that MPS is currently unreliable. Reducing the issue to a torch-only minimal repro may help identify the root cause and potentially lead to a fix.

Recommendation

Apply workaround: Run the pipeline on the CPU, as it is known to produce correct output, although it may be slower than expected MPS throughput. This workaround is recommended due to the known issues with the MPS backend.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix MPS: `add_dense_scalar_cast_float` shader fails with "Read-only bytes bound to shader argument with write access enabled" assertion during pyannote.audio inference (M4 Pro, torch 2.11) [1 comments, 2 participants]