transformers - ✅(Solved) Fix GraniteMoEHybrid Model Calls Invalid Method [1 pull requests, 1 participants]

transformers2026-04-18 17:07:36

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45507•Fetched 2026-04-19 15:03:58

View on GitHub

Comments

Participants

Timeline

Reactions

Author

rnowling

Participants

rnowling

Timeline (top)

cross-referenced ×1labeled ×1referenced ×1

Error Message

Traceback (most recent call last): File "/home/rnowling/Projects/robust-llm-data-generators/sequence_token_probabilities/generate_text.py", line 48, in <module> generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=MAX_LENGTH) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2543, in generate result = decoding_method( ^^^^^^^^^^^^^^^^ File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2736, in _sample outputs = self._prefill( ^^^^^^^^^^^^^^ File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 3768, in _prefill return self(**model_inputs, return_dict=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/utils/generic.py", line 876, in wrapper output = func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/models/granitemoehybrid/modeling_granitemoehybrid.py", line 1365, in forward outputs = self.model( ^^^^^^^^^^^ File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/utils/generic.py", line 952, in wrapper output = func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/utils/output_capturing.py", line 248, in wrapper outputs = func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/models/granitemoehybrid/modeling_granitemoehybrid.py", line 1183, in forward mamba_mask = self._update_mamba_mask(attention_mask, past_key_values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/models/granitemoehybrid/modeling_granitemoehybrid.py", line 1217, in _update_mamba_mask if (past_key_values is not None and past_key_values.has_previous_state()) or ( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/cache_utils.py", line 1057, in has_previous_state raise ValueError( ValueError: has_previous_state can only be called on LinearAttention layers, and the current Cache seem to only contain Attention layers.

PR fix notes

PR #45514: Fix GraniteMoeHybrid _update_mamba_mask crash on attention-only models

Repository: huggingface/transformers
Author: tianhaocui
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/45514

Description (problem / solution / changelog)

Fixes #45507

Summary

GraniteMoeHybridModel._update_mamba_mask calls past_key_values.has_previous_state() without checking whether the model actually has mamba layers. When all layers are attention-only (no mamba layers in config.layers_block_type), has_previous_state() fails to find a LinearAttentionCacheLayerMixin layer and raises ValueError.

Fix

Check config.layers_block_type for mamba layers before calling has_previous_state(). If no mamba layers exist, return the attention mask as-is since the mamba mask optimization is irrelevant.

Applied to both modeling_granitemoehybrid.py and modular_granitemoehybrid.py.

Changed files

src/transformers/models/granitemoehybrid/modeling_granitemoehybrid.py (modified, +3/-0)
src/transformers/models/granitemoehybrid/modular_granitemoehybrid.py (modified, +3/-0)

Code Example

$ CUDA_VISIBLE_DEVICES=0 python generate_text.py --model-name ibm-granite/granite-4.0-350m-base --n-samples 256 --output-fl granite-350m-base.feather

---

import argparse

from datasets import Dataset
import numpy as np
import pandas as pd
from transformers import AutoTokenizer, DataCollatorForLanguageModeling
from transformers import AutoModelForCausalLM, TrainingArguments, Trainer

BATCH_SIZE = 32
MAX_LENGTH = 1024
PREFIX = "Your horoscope is: "

def parse_args():
    parser = argparse.ArgumentParser()

    parser.add_argument("--output-fl", type=str, required=True)
    parser.add_argument("--model-name", type=str, required=True)
    parser.add_argument("--n-samples", type=int, required=True)

    return parser.parse_args()

if __name__ == "__main__":
    args = parse_args()

    tokenizer = AutoTokenizer.from_pretrained(args.model_name, padding_side="left")
    model = AutoModelForCausalLM.from_pretrained(args.model_name)

    print(model.generation_config)

    # see tips here: https://huggingface.co/docs/transformers/en/model_doc/llama3
    if tokenizer.pad_token is None:
        print("Needed to add padding token")
        tokenizer.add_special_tokens({"pad_token":"<pad>"})
        model.resize_token_embeddings(len(tokenizer))
        model.config.pad_token_id = tokenizer.pad_token_id

    model.eval()

    # why is this necessary?
    model.to("cuda")

    batch = [PREFIX] * BATCH_SIZE
    model_inputs = tokenizer(batch, return_tensors="pt").to(model.device)

    generated_samples = []
    n_batches = args.n_samples // BATCH_SIZE + 1
    for _ in range(n_batches):
        generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=MAX_LENGTH)
        batch_samples = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

        generated_samples.extend(batch_samples)

    if len(generated_samples) > args.n_samples:
        generated_samples = generated_samples[:args.n_samples]

    df = pd.DataFrame({"generated_sample" : generated_samples,
                       "model" : [args.model_name] * args.n_samples })

    df.head()

    df.to_feather(args.output_fl, compression="zstd")

---

Traceback (most recent call last):
  File "/home/rnowling/Projects/robust-llm-data-generators/sequence_token_probabilities/generate_text.py", line 48, in <module>
    generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=MAX_LENGTH)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2543, in generate
    result = decoding_method(
             ^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2736, in _sample
    outputs = self._prefill(
              ^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 3768, in _prefill
    return self(**model_inputs, return_dict=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/utils/generic.py", line 876, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/models/granitemoehybrid/modeling_granitemoehybrid.py", line 1365, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/utils/generic.py", line 952, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/utils/output_capturing.py", line 248, in wrapper
    outputs = func(self, *args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/models/granitemoehybrid/modeling_granitemoehybrid.py", line 1183, in forward
    mamba_mask = self._update_mamba_mask(attention_mask, past_key_values)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/models/granitemoehybrid/modeling_granitemoehybrid.py", line 1217, in _update_mamba_mask
    if (past_key_values is not None and past_key_values.has_previous_state()) or (
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/cache_utils.py", line 1057, in has_previous_state
    raise ValueError(
ValueError: `has_previous_state` can only be called on LinearAttention layers, and the current Cache seem to only contain Attention layers.

RAW_BUFFERClick to expand / collapse

System Info

Linux: Ubuntu 24.04.4 LTS / 6.8.0-107-generic-64k / aarch64 Python: 3.12.12 Transformers: 5.5.4 Cuda: 12.9

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Run the following script as follows:

$ CUDA_VISIBLE_DEVICES=0 python generate_text.py --model-name ibm-granite/granite-4.0-350m-base --n-samples 256 --output-fl granite-350m-base.feather

import argparse

from datasets import Dataset
import numpy as np
import pandas as pd
from transformers import AutoTokenizer, DataCollatorForLanguageModeling
from transformers import AutoModelForCausalLM, TrainingArguments, Trainer

BATCH_SIZE = 32
MAX_LENGTH = 1024
PREFIX = "Your horoscope is: "

def parse_args():
    parser = argparse.ArgumentParser()

    parser.add_argument("--output-fl", type=str, required=True)
    parser.add_argument("--model-name", type=str, required=True)
    parser.add_argument("--n-samples", type=int, required=True)

    return parser.parse_args()

if __name__ == "__main__":
    args = parse_args()

    tokenizer = AutoTokenizer.from_pretrained(args.model_name, padding_side="left")
    model = AutoModelForCausalLM.from_pretrained(args.model_name)

    print(model.generation_config)

    # see tips here: https://huggingface.co/docs/transformers/en/model_doc/llama3
    if tokenizer.pad_token is None:
        print("Needed to add padding token")
        tokenizer.add_special_tokens({"pad_token":"<pad>"})
        model.resize_token_embeddings(len(tokenizer))
        model.config.pad_token_id = tokenizer.pad_token_id

    model.eval()

    # why is this necessary?
    model.to("cuda")

    batch = [PREFIX] * BATCH_SIZE
    model_inputs = tokenizer(batch, return_tensors="pt").to(model.device)

    generated_samples = []
    n_batches = args.n_samples // BATCH_SIZE + 1
    for _ in range(n_batches):
        generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=MAX_LENGTH)
        batch_samples = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)

        generated_samples.extend(batch_samples)

    if len(generated_samples) > args.n_samples:
        generated_samples = generated_samples[:args.n_samples]

    df = pd.DataFrame({"generated_sample" : generated_samples,
                       "model" : [args.model_name] * args.n_samples })

    df.head()

    df.to_feather(args.output_fl, compression="zstd")

It produces the following stack trace:

Traceback (most recent call last):
  File "/home/rnowling/Projects/robust-llm-data-generators/sequence_token_probabilities/generate_text.py", line 48, in <module>
    generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=MAX_LENGTH)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2543, in generate
    result = decoding_method(
             ^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2736, in _sample
    outputs = self._prefill(
              ^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 3768, in _prefill
    return self(**model_inputs, return_dict=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/utils/generic.py", line 876, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/models/granitemoehybrid/modeling_granitemoehybrid.py", line 1365, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/utils/generic.py", line 952, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/utils/output_capturing.py", line 248, in wrapper
    outputs = func(self, *args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/models/granitemoehybrid/modeling_granitemoehybrid.py", line 1183, in forward
    mamba_mask = self._update_mamba_mask(attention_mask, past_key_values)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/models/granitemoehybrid/modeling_granitemoehybrid.py", line 1217, in _update_mamba_mask
    if (past_key_values is not None and past_key_values.has_previous_state()) or (
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rnowling/pytorch-venv/lib/python3.12/site-packages/transformers/cache_utils.py", line 1057, in has_previous_state
    raise ValueError(
ValueError: `has_previous_state` can only be called on LinearAttention layers, and the current Cache seem to only contain Attention layers.

Expected behavior

I expect the script to be able to generate text from the model. :)

extent analysis

TL;DR

The most likely fix is to modify the generate method call to properly handle the model's attention mechanism, potentially by adjusting the past_key_values argument or the model's configuration.

Guidance

Check the model's configuration: Verify that the ibm-granite/granite-4.0-350m-base model is correctly configured for text generation, especially regarding its attention mechanism.
Adjust the generate method call: Consider modifying the generate method call to include the correct past_key_values argument or to use a different decoding strategy that is compatible with the model's architecture.
Consult the model's documentation: Review the documentation for the ibm-granite/granite-4.0-350m-base model to ensure that it supports the desired text generation task and to understand any specific requirements or limitations.

Example

# Example of modifying the generate method call
generated_ids = model.generate(model_inputs.input_ids, 
                                max_new_tokens=MAX_LENGTH, 
                                past_key_values=None)  # or adjust this argument accordingly

Notes

The provided stack trace suggests that the issue is related to the model's attention mechanism, specifically the has_previous_state method. However, without more information about the model's architecture and configuration, it is difficult to provide a more specific solution.

Recommendation

Apply a workaround by modifying the generate method call to properly handle the model's attention mechanism, as the root cause of the issue appears to be related to the model's configuration or architecture.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

I expect the script to be able to generate text from the model. :)

#embedding generation #cache error #pipeline error #runtime error #dependency conflict

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - ✅(Solved) Fix GraniteMoEHybrid Model Calls Invalid Method [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

PR fix notes

PR #45514: Fix GraniteMoeHybrid _update_mamba_mask crash on attention-only models

Description (problem / solution / changelog)

Summary

Fix

Changed files

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

transformers - ✅(Solved) Fix GraniteMoEHybrid Model Calls Invalid Method [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

PR fix notes

PR #45514: Fix GraniteMoeHybrid _update_mamba_mask crash on attention-only models

Description (problem / solution / changelog)

Summary

Fix

Changed files

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING