pytorch - 💡(How to fix) Fix MPS: `add_dense_scalar_cast_float` shader fails with "Read-only bytes bound to shader argument with write access enabled" assertion during pyannote.audio inference (M4 Pro, torch 2.11) [1 comments, 2 participants]

pytorch2026-04-27 20:47:37

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#181650•Fetched 2026-04-28 06:24:01

View on GitHub

Comments

Participants

Timeline

Reactions

Author

denkhub-io

Participants

denkhub-io

malfet

Timeline (top)

mentioned ×36subscribed ×36labeled ×6unlabeled ×2

Running pyannote-audio 4.0.4 with the pyannote/speaker-diarization-3.1 model on Apple Silicon (M4 Pro, macOS 26.1) crashes mid-inference when the pipeline is moved to MPS. The Metal validation layer aborts with:

Compute Function(add_dense_scalar_cast_float):
  Read-only bytes are being bound at index N to a shader argument with write access enabled

The same pipeline produces correct output on CPU on the same machine.

Root Cause

Compute Function(add_dense_scalar_cast_float):
  Read-only bytes are being bound at index N to a shader argument with write access enabled

The same pipeline produces correct output on CPU on the same machine.

Fix Action

Fix / Workaround

Workarounds tried

Happy to test patches.

Code Example

Compute Function(add_dense_scalar_cast_float):
  Read-only bytes are being bound at index N to a shader argument with write access enabled

---

import torch
from pyannote.audio import Pipeline

p = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", token="…")
p.to(torch.device("mps"))
out = p("audio_16k_mono.wav", num_speakers=4)  # → shader assertion here

RAW_BUFFERClick to expand / collapse

Summary

Compute Function(add_dense_scalar_cast_float):
  Read-only bytes are being bound at index N to a shader argument with write access enabled

The same pipeline produces correct output on CPU on the same machine.

Repro

I do not yet have a torch-only minimal repro — the failure surfaces inside pyannote's embedding-extraction step and the offending tensor shape/dtype isn't exposed in the assertion. Happy to dig deeper with guidance.

Pipeline-level repro:

import torch
from pyannote.audio import Pipeline

p = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", token="…")
p.to(torch.device("mps"))
out = p("audio_16k_mono.wav", num_speakers=4)  # → shader assertion here

Environment

Hardware: Mac16,7 · Apple M4 Pro · arm64
OS: macOS 26.1 (build 25B78)
Python: 3.13.13
PyTorch: 2.11.0
pyannote-audio: 4.0.4 (model: speaker-diarization-3.1)
torch.backends.mps.is_available() → True
torch.backends.mps.is_built() → True

Workarounds tried

PYTORCH_ENABLE_MPS_FALLBACK=1 — does not help; pipeline still crashes on the same op.
CPU works but is ~5-10× slower than expected MPS throughput on this hardware.

Related reports of the same shader bug class

A user on the Hugging Face forum reports the identical assertion class on a sibling shader name sub_dense_scalar_long_long, suggesting a shared bug in the *_dense_scalar_* MPS kernel family rather than something specific to pyannote:

https://discuss.huggingface.co/t/csm-1b-model-and-metal-shader-issue/173053

pyannote-side acknowledgements that MPS is currently unreliable, all pointing back to the PyTorch MPS backend:

pyannote/pyannote-audio#1886 — kernel crash on M4 with MPS, closed wontfix
pyannote/pyannote-audio#1337 — wrong timestamps with MPS on M1; maintainer: "this is an issue in pytorch (not so ready for prime time) support for mps"
pyannote/pyannote-audio#1091 — pyannote on M1 mps; maintainer recommends CPU

Asks

Is add_dense_scalar_cast_float tracked anywhere I missed?
Is the *_dense_scalar_* MPS kernel family a known broken family, or are individual ops tracked separately?
Any guidance on reducing this to a torch-only minimal repro from just the assertion text?

Happy to test patches.

cc @snadampal @milpuz01 @aditew01 @nikhil-arm @fadara01 @robert-hardwick @nWEIdia @kulinseth @malfet @DenisVieriu97 @jhavukainen @aditvenk

extent analysis

TL;DR

The most likely fix is to avoid using the Metal Performance Shaders (MPS) backend due to known issues with the *_dense_scalar_* kernel family.

Guidance

Try running the pipeline on the CPU to verify that the issue is indeed related to the MPS backend.
Investigate the PyTorch MPS backend issues and consider filing a bug report or waiting for a fix, as the *_dense_scalar_* kernel family seems to be problematic.
To reduce the issue to a torch-only minimal repro, try isolating the specific tensor operation that causes the crash and reproducing it with a simple PyTorch script.

Example

No code example is provided, as the issue is related to a specific hardware and backend configuration.

Notes

The issue seems to be related to a known problem with the PyTorch MPS backend, and the pyannote-audio maintainers have acknowledged that MPS is currently unreliable. Reducing the issue to a torch-only minimal repro may help identify the root cause and potentially lead to a fix.

Recommendation

Apply workaround: Run the pipeline on the CPU, as it is known to produce correct output, although it may be slower than expected MPS throughput. This workaround is recommended due to the known issues with the MPS backend.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#tensor shape #cache issue #memory leak #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix MPS: `add_dense_scalar_cast_float` shader fails with "Read-only bytes bound to shader argument with write access enabled" assertion during pyannote.audio inference (M4 Pro, torch 2.11) [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Workarounds tried

Code Example

Summary

Repro

Environment

Workarounds tried

Related reports of the same shader bug class

Asks

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix MPS: `add_dense_scalar_cast_float` shader fails with "Read-only bytes bound to shader argument with write access enabled" assertion during pyannote.audio inference (M4 Pro, torch 2.11) [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Workarounds tried

Code Example

Summary

Repro

Environment

Workarounds tried

Related reports of the same shader bug class

Asks

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING