transformers - ✅(Solved) Fix [moe] mps interface has error "histogram_mps" not implemented for 'Int' [1 pull requests, 1 comments, 1 participants]

Q: Expected behavior

# after modify ~~~shell /Users/chenzhe/Desktop/workdir/alo/.venv/bin/python /Users/chenzhe/Desktop/workdir/alo/src/model_inference.py Loading weights: 100%|██████████| 1013/1013 [00:12<00:00, 82.22it/s] {'role': 'assistant', 'content': 'Why did the computer go to therapy?\n\nBecause it had too many open tabs and felt like it was losing its memory.'} ~~~

transformers2026-04-28 13:08:21

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45685•Fetched 2026-04-29 06:11:21

View on GitHub

Comments

Participants

Timeline

Reactions

Author

chenzhe1204

Participants

chenzhe1204

Timeline (top)

commented ×1cross-referenced ×1labeled ×1mentioned ×1

Error Message

/Users/chenzhe/Desktop/workdir/alo/.venv/bin/python /Users/chenzhe/Desktop/workdir/alo/src/model_inference.py Loading weights: 100%|██████████| 1013/1013 [00:12<00:00, 82.15it/s] Traceback (most recent call last): File "/Users/chenzhe/Desktop/workdir/alo/src/model_inference.py", line 36, in <module> outputs = model.generate(**inputs, max_new_tokens=1024) File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context return func(*args, **kwargs) File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/generation/utils.py", line 2543, in generate result = decoding_method( self, ...<5 lines>... **model_kwargs, ) File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/generation/utils.py", line 2736, in _sample outputs = self._prefill( input_ids, ...<2 lines>... is_first_iteration=not generation_config.is_assistant, ) File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/generation/utils.py", line 3768, in _prefill return self(**model_inputs, return_dict=True) File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl return forward_call(*args, **kwargs) File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/utils/generic.py", line 887, in wrapper output = func(self, *args, **kwargs) File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/models/gemma4/modeling_gemma4.py", line 2516, in forward outputs = self.model( input_ids=input_ids, ...<14 lines>... **kwargs, ) File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl return forward_call(*args, **kwargs) File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/utils/generic.py", line 963, in wrapper output = func(self, *args, **kwargs) File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/utils/generic.py", line 887, in wrapper output = func(self, *args, **kwargs) File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/models/gemma4/modeling_gemma4.py", line 2374, in forward outputs = self.language_model( per_layer_inputs=per_layer_inputs, ...<6 lines>... **kwargs, ) File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl return forward_call(*args, **kwargs) File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/utils/generic.py", line 963, in wrapper output = func(self, *args, **kwargs) File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/utils/output_capturing.py", line 248, in wrapper outputs = func(self, *args, **kwargs) File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/models/gemma4/modeling_gemma4.py", line 1675, in forward hidden_states = decoder_layer( hidden_states, ...<6 lines>... **kwargs, ) File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/modeling_layers.py", line 93, in call return super().call(*args, **kwargs) ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl return forward_call(*args, **kwargs) File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/models/gemma4/modeling_gemma4.py", line 1402, in forward hidden_states_2 = self.experts(hidden_states_2, top_k_index, top_k_weights) File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl return forward_call(*args, **kwargs) File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/integrations/moe.py", line 536, in forward return experts_forward(self, *args, **kwargs) File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/integrations/moe.py", line 403, in grouped_mm_experts_forward tokens_per_expert = torch.histc(histc_input, bins=self.num_experts, min=0, max=self.num_experts - 1) NotImplementedError: "histogram_mps" not implemented for 'Int'

Root Cause

after modify

/Users/chenzhe/Desktop/workdir/alo/.venv/bin/python /Users/chenzhe/Desktop/workdir/alo/src/model_inference.py 
Loading weights: 100%|██████████| 1013/1013 [00:12<00:00, 82.22it/s]
{'role': 'assistant', 'content': 'Why did the computer go to therapy?\n\nBecause it had too many open tabs and felt like it was losing its memory.'}

Fix Action

Fixed

Fixed by PR: fix: Made histc_input robust for broader hardware (https://github.com/huggingface/transformers/pull/45687)

PR fix notes

PR #45687: fix: Made histc_input robust for broader hardware

Repository: huggingface/transformers
Author: rigen1048
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/45687

Description (problem / solution / changelog)

What does this PR do?

Fixes a NotImplementedError: "histogram_mps" not implemented for 'Int' when running Mixture-of-Experts (MoE) models on Apple Silicon (MPS backend). The error occurred in src/transformers/integrations/moe.py because torch.histc does not support integer dtypes on MPS. The original condition failed to account for the MPS backend. This PR improves the logic by checking specifically for CUDA instead, as float operations are more reliably supported across a wider range of hardware, including legacy devices.

Before:

Use float on CPU
Use int on all other backend (CUDA, MPS, XPU, TPU, etc)

After:

Use int on CUDA (best performance)
Use float32 on all other backends (CPU, MPS, XPU, etc.)

Fixes #45685

Code Agent Policy

The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by code agents. We are currently bottlenecked by our ability to review and respond to them. As a result, we ask that new users do not submit pure code agent PRs at this time. You may use code agents in drafting or to help you diagnose issues. We'd also ask autonomous "OpenClaw"-like agents not to open any PRs or issues for the moment. PRs that appear to be fully agent-written will probably be closed without review, and we may block users who do this repeatedly or maliciously. This is a rapidly-evolving situation that's causing significant shockwaves in the open-source community. As a result, this policy is likely to be updated regularly in the near future. For more information, please read CONTRIBUTING.md.

I confirm that this is not a pure code agent PR.
- AI was used to understand broader application and write a manual smoke test script for regression checking & bring clarity to the PR

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum?
Yes, discussed in #45685
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@IlyasMoutawwakil (author of the current grouped_mm_experts_forward logic)
@ArthurZucker
@cyrilvallez

Changed files

src/transformers/integrations/moe.py (modified, +6/-1)

Code Example

from transformers import AutoProcessor, AutoModelForCausalLM

MODEL_ID = "google/gemma-4-26B-A4B-it"

# Load model
processor = AutoProcessor.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    dtype="auto",
    device_map="auto"
)

# Prompt
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Write a short joke about saving RAM."},
]

# Process input
text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False
)
inputs = processor(text=text, return_tensors="pt").to(model.device)
input_len = inputs["input_ids"].shape[-1]

# Generate output
outputs = model.generate(**inputs, max_new_tokens=1024)
response = processor.decode(outputs[0][input_len:], skip_special_tokens=False)

# Parse output
print(processor.parse_response(response))

---

/Users/chenzhe/Desktop/workdir/alo/.venv/bin/python /Users/chenzhe/Desktop/workdir/alo/src/model_inference.py 
Loading weights: 100%|██████████| 1013/1013 [00:12<00:00, 82.15it/s] 
Traceback (most recent call last):
  File "/Users/chenzhe/Desktop/workdir/alo/src/model_inference.py", line 36, in <module>
    outputs = model.generate(**inputs, max_new_tokens=1024)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/generation/utils.py", line 2543, in generate
    result = decoding_method(
        self,
    ...<5 lines>...
        **model_kwargs,
    )
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/generation/utils.py", line 2736, in _sample
    outputs = self._prefill(
        input_ids,
    ...<2 lines>...
        is_first_iteration=not generation_config.is_assistant,
    )
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/generation/utils.py", line 3768, in _prefill
    return self(**model_inputs, return_dict=True)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/utils/generic.py", line 887, in wrapper
    output = func(self, *args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/models/gemma4/modeling_gemma4.py", line 2516, in forward
    outputs = self.model(
        input_ids=input_ids,
    ...<14 lines>...
        **kwargs,
    )
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/utils/generic.py", line 963, in wrapper
    output = func(self, *args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/utils/generic.py", line 887, in wrapper
    output = func(self, *args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/models/gemma4/modeling_gemma4.py", line 2374, in forward
    outputs = self.language_model(
        per_layer_inputs=per_layer_inputs,
    ...<6 lines>...
        **kwargs,
    )
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/utils/generic.py", line 963, in wrapper
    output = func(self, *args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/utils/output_capturing.py", line 248, in wrapper
    outputs = func(self, *args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/models/gemma4/modeling_gemma4.py", line 1675, in forward
    hidden_states = decoder_layer(
        hidden_states,
    ...<6 lines>...
        **kwargs,
    )
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/modeling_layers.py", line 93, in __call__
    return super().__call__(*args, **kwargs)
           ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/models/gemma4/modeling_gemma4.py", line 1402, in forward
    hidden_states_2 = self.experts(hidden_states_2, top_k_index, top_k_weights)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/integrations/moe.py", line 536, in forward
    return experts_forward(self, *args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/integrations/moe.py", line 403, in grouped_mm_experts_forward
    tokens_per_expert = torch.histc(histc_input, bins=self.num_experts, min=0, max=self.num_experts - 1)
NotImplementedError: "histogram_mps" not implemented for 'Int'

RAW_BUFFERClick to expand / collapse

System Info

Transformers env info

Python version: 3.13.9 os system: macOS-26.4.1-arm64-arm-64bit-Mach-O PyTorch version: 2.11.0 Transformers version: 5.6.2 CUDA : False MPS: True

Who can help?

@cyrilvallez in moe.py line 402 when platform is macOS , the torch backend is mps, but mps not implemented for 'Int' so

histc_input = expert_ids_g.float() if device.type in ("cpu","mps") else expert_ids_g.int()

not

histc_input = expert_ids_g.float() if device.type == "cpu" else expert_ids_g.int()

I might be wrong, but it works for me after the modification.

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

my script

from transformers import AutoProcessor, AutoModelForCausalLM

MODEL_ID = "google/gemma-4-26B-A4B-it"

# Load model
processor = AutoProcessor.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    dtype="auto",
    device_map="auto"
)

# Prompt
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Write a short joke about saving RAM."},
]

# Process input
text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False
)
inputs = processor(text=text, return_tensors="pt").to(model.device)
input_len = inputs["input_ids"].shape[-1]

# Generate output
outputs = model.generate(**inputs, max_new_tokens=1024)
response = processor.decode(outputs[0][input_len:], skip_special_tokens=False)

# Parse output
print(processor.parse_response(response))

error detail

/Users/chenzhe/Desktop/workdir/alo/.venv/bin/python /Users/chenzhe/Desktop/workdir/alo/src/model_inference.py 
Loading weights: 100%|██████████| 1013/1013 [00:12<00:00, 82.15it/s] 
Traceback (most recent call last):
  File "/Users/chenzhe/Desktop/workdir/alo/src/model_inference.py", line 36, in <module>
    outputs = model.generate(**inputs, max_new_tokens=1024)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/generation/utils.py", line 2543, in generate
    result = decoding_method(
        self,
    ...<5 lines>...
        **model_kwargs,
    )
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/generation/utils.py", line 2736, in _sample
    outputs = self._prefill(
        input_ids,
    ...<2 lines>...
        is_first_iteration=not generation_config.is_assistant,
    )
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/generation/utils.py", line 3768, in _prefill
    return self(**model_inputs, return_dict=True)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/utils/generic.py", line 887, in wrapper
    output = func(self, *args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/models/gemma4/modeling_gemma4.py", line 2516, in forward
    outputs = self.model(
        input_ids=input_ids,
    ...<14 lines>...
        **kwargs,
    )
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/utils/generic.py", line 963, in wrapper
    output = func(self, *args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/utils/generic.py", line 887, in wrapper
    output = func(self, *args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/models/gemma4/modeling_gemma4.py", line 2374, in forward
    outputs = self.language_model(
        per_layer_inputs=per_layer_inputs,
    ...<6 lines>...
        **kwargs,
    )
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/utils/generic.py", line 963, in wrapper
    output = func(self, *args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/utils/output_capturing.py", line 248, in wrapper
    outputs = func(self, *args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/models/gemma4/modeling_gemma4.py", line 1675, in forward
    hidden_states = decoder_layer(
        hidden_states,
    ...<6 lines>...
        **kwargs,
    )
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/modeling_layers.py", line 93, in __call__
    return super().__call__(*args, **kwargs)
           ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/models/gemma4/modeling_gemma4.py", line 1402, in forward
    hidden_states_2 = self.experts(hidden_states_2, top_k_index, top_k_weights)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/integrations/moe.py", line 536, in forward
    return experts_forward(self, *args, **kwargs)
  File "/Users/chenzhe/Desktop/workdir/alo/.venv/lib/python3.13/site-packages/transformers/integrations/moe.py", line 403, in grouped_mm_experts_forward
    tokens_per_expert = torch.histc(histc_input, bins=self.num_experts, min=0, max=self.num_experts - 1)
NotImplementedError: "histogram_mps" not implemented for 'Int'

Expected behavior

after modify

/Users/chenzhe/Desktop/workdir/alo/.venv/bin/python /Users/chenzhe/Desktop/workdir/alo/src/model_inference.py 
Loading weights: 100%|██████████| 1013/1013 [00:12<00:00, 82.22it/s]
{'role': 'assistant', 'content': 'Why did the computer go to therapy?\n\nBecause it had too many open tabs and felt like it was losing its memory.'}

extent analysis

TL;DR

The issue can be resolved by modifying the moe.py file to handle the histc_input data type correctly when the device type is "mps".

Guidance

The error occurs because the torch.histc function is not implemented for 'Int' data type on the MPS device.
The user has already identified a potential fix by modifying the moe.py file to use float() instead of int() when the device type is "mps".
To verify the fix, run the modified script and check if the output is generated correctly.
If the issue persists, try to update the PyTorch and Transformers versions to the latest ones, as this might be a version-specific bug.

Example

The modified code in moe.py should look like this:

histc_input = expert_ids_g.float() if device.type in ("cpu", "mps") else expert_ids_g.int()

Notes

This fix assumes that the float() conversion does not affect the accuracy of the model. If the model requires integer inputs, a different solution might be needed.

Recommendation

Apply the workaround by modifying the moe.py file as suggested by the user, as it has been reported to work for them.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

after modify

/Users/chenzhe/Desktop/workdir/alo/.venv/bin/python /Users/chenzhe/Desktop/workdir/alo/src/model_inference.py 
Loading weights: 100%|██████████| 1013/1013 [00:12<00:00, 82.22it/s]
{'role': 'assistant', 'content': 'Why did the computer go to therapy?\n\nBecause it had too many open tabs and felt like it was losing its memory.'}

#autograd error #model save/load #optimization #mixed precision #training loop

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

transformers - ✅(Solved) Fix [moe] mps interface has error "histogram_mps" not implemented for 'Int' [1 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

after modify

Fix Action

Fixed

PR fix notes

PR #45687: fix: Made histc_input robust for broader hardware

Description (problem / solution / changelog)

What does this PR do?

Code Agent Policy

Before submitting

Who can review?

Changed files

Code Example

System Info

Transformers env info

Who can help?

Information

Tasks

Reproduction

my script

error detail

Expected behavior

after modify

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

after modify

Still need to ship something?

RELATED_DISCOVERY

TRENDING