transformers - 💡(How to fix) Fix GPT-OSS-20B not work in AMD GPUs [1 comments, 1 participants]

transformers2026-04-04 07:20:50

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45237•Fetched 2026-04-08 02:43:46

View on GitHub

Comments

Participants

Timeline

Reactions

Author

tanreinama

Participants

tanreinama

Timeline (top)

commented ×1labeled ×1mentioned ×1subscribed ×1

Error Message

$ pip install -U transformers kernels accelerate $ python

from transformers import pipeline import torch model_id = "openai/gpt-oss-20b" pipe = pipeline( ... "text-generation", ... model=model_id, ... torch_dtype="auto", ... device_map="auto", ... ) torch_dtype is deprecated! Use dtype instead! Fetching 42 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 42/42 [00:02<00:00, 16.72it/s] Download complete: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 101/101 [00:02<00:00, 39.0B/s] Loading weights: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 411/411 [01:16<00:00, 5.40it/s] messages = [ ... {"role": "user", "content": "Explain quantum mechanics clearly and concisely."}, ... ] outputs = pipe( ... messages, ... max_new_tokens=256, ... ) Passing generation_config together with generation-related arguments=({'max_new_tokens'}) is deprecated and will be removed in future versions. Please pass either a generation_config object OR all generation parameters explicitly, but not both. Both max_new_tokens (=256) and max_length(=20) seem to have been set. max_new_tokens will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation) Traceback (most recent call last): File "<python-input-5>", line 1, in <module> outputs = pipe( messages, max_new_tokens=256, ) File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/pipelines/text_generation.py", line 299, in call return super().call(text_inputs, **kwargs) ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^ File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/pipelines/base.py", line 1264, in call return self.run_single(inputs, preprocess_params, forward_params, postprocess_params) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/pipelines/base.py", line 1271, in run_single model_outputs = self.forward(model_inputs, **forward_params) File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/pipelines/base.py", line 1163, in forward model_outputs = self._forward(model_inputs, **forward_params) File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/pipelines/text_generation.py", line 403, in _forward output = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs) File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context return func(*args, **kwargs) File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/generation/utils.py", line 2543, in generate result = decoding_method( self, ...<5 lines>... **model_kwargs, ) File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/generation/utils.py", line 2736, in _sample outputs = self._prefill( input_ids, ...<2 lines>... is_first_iteration=not generation_config.is_assistant, ) File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/generation/utils.py", line 3768, in _prefill return self(**model_inputs, return_dict=True) File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl return forward_call(*args, **kwargs) File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/utils/generic.py", line 876, in wrapper output = func(self, *args, **kwargs) File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/models/gpt_oss/modeling_gpt_oss.py", line 649, in forward outputs: MoeModelOutputWithPast = self.model( ~~~~~~~~~~^ input_ids=input_ids, ^^^^^^^^^^^^^^^^^^^^ ...<6 lines>... **kwargs, ^^^^^^^^^ ) ^ File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl return forward_call(*args, **kwargs) File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/utils/generic.py", line 952, in wrapper output = func(self, *args, **kwargs) File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/utils/output_capturing.py", line 248, in wrapper outputs = func(self, *args, **kwargs) File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/models/gpt_oss/modeling_gpt_oss.py", line 490, in forward hidden_states = decoder_layer( hidden_states, ...<5 lines>... **kwargs, ) File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/modeling_layers.py", line 93, in call return super().call(*args, **kwargs) ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl return forward_call(*args, **kwargs) File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/models/gpt_oss/modeling_gpt_oss.py", line 384, in forward hidden_states, _ = self.mlp(hidden_states) # diff with llama: router scores ~~~~~~~~^^^^^^^^^^^^^^^ File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl return forward_call(*args, **kwargs) File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/integrations/mxfp4.py", line 508, in mlp_forward routed_out = self.experts(hidden_states, routing_data, gather_idx, scatter_idx=scatter_idx) File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl return forward_call(*args, **kwargs) File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/integrations/mxfp4.py", line 411, in forward intermediate_cache3 = matmul_ogs( intermediate_cache1, ...<5 lines>... gammas=routing_data.gate_scal, ) File "/home/nama/.cache/huggingface/hub/models--kernels-community--gpt-oss-triton-kernels/snapshots/76c23fb9a6607cd5c62c1e6b8e7f436ec5385517/build/torch-rocm/matmul_ogs.py", line 583, in matmul_ogs out = apply_postprocessing_features(scatter_indx, finalize_scatter_idxs, opt_flags, expt_token_offs_raw, num_indx, precision_config, routing_data, postprocessing_features, memory, fused_postprocess_activation, epilogue) File "/home/nama/.cache/huggingface/hub/models--kernels-community--gpt-oss-triton-kernels/snapshots/76c23fb9a6607cd5c62c1e6b8e7f436ec5385517/build/torch-rocm/matmul_ogs.py", line 252, in apply_postprocessing_features grid, (BLOCK_N, num_warps) = sorted([(compute_grid(*c), c) for c in candidates], key=lambda x: x[0][1])[0] ~~~~~~~~~~~~^^^^ File "/home/nama/.cache/huggingface/hub/models--kernels-community--gpt-oss-triton-kernels/snapshots/76c23fb9a6607cd5c62c1e6b8e7f436ec5385517/build/torch-rocm/matmul_ogs.py", line 223, in compute_grid num_pid = target_info.num_sms() * (warps_per_sm // num_warps) ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'

Code Example

$ pip install -U transformers kernels accelerate
$ python
>>> from transformers import pipeline
>>> import torch
>>> model_id = "openai/gpt-oss-20b"
>>> pipe = pipeline(
...     "text-generation",
...     model=model_id,
...     torch_dtype="auto",
...     device_map="auto",
... )
`torch_dtype` is deprecated! Use `dtype` instead!
Fetching 42 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 42/42 [00:02<00:00, 16.72it/s]
Download complete: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 101/101 [00:02<00:00, 39.0B/s]
Loading weights: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 411/411 [01:16<00:00,  5.40it/s]
>>> messages = [
...     {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
... ]
>>> outputs = pipe(
...     messages,
...     max_new_tokens=256,
... )
Passing `generation_config` together with generation-related arguments=({'max_new_tokens'}) is deprecated and will be removed in future versions. Please pass either a `generation_config` object OR all generation parameters explicitly, but not both.
Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Traceback (most recent call last):
  File "<python-input-5>", line 1, in <module>
    outputs = pipe(
        messages,
        max_new_tokens=256,
    )
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/pipelines/text_generation.py", line 299, in __call__
    return super().__call__(text_inputs, **kwargs)
           ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/pipelines/base.py", line 1264, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/pipelines/base.py", line 1271, in run_single
    model_outputs = self.forward(model_inputs, **forward_params)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/pipelines/base.py", line 1163, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/pipelines/text_generation.py", line 403, in _forward
    output = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/generation/utils.py", line 2543, in generate
    result = decoding_method(
        self,
    ...<5 lines>...
        **model_kwargs,
    )
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/generation/utils.py", line 2736, in _sample
    outputs = self._prefill(
        input_ids,
    ...<2 lines>...
        is_first_iteration=not generation_config.is_assistant,
    )
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/generation/utils.py", line 3768, in _prefill
    return self(**model_inputs, return_dict=True)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/utils/generic.py", line 876, in wrapper
    output = func(self, *args, **kwargs)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/models/gpt_oss/modeling_gpt_oss.py", line 649, in forward
    outputs: MoeModelOutputWithPast = self.model(
                                      ~~~~~~~~~~^
        input_ids=input_ids,
        ^^^^^^^^^^^^^^^^^^^^
    ...<6 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/utils/generic.py", line 952, in wrapper
    output = func(self, *args, **kwargs)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/utils/output_capturing.py", line 248, in wrapper
    outputs = func(self, *args, **kwargs)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/models/gpt_oss/modeling_gpt_oss.py", line 490, in forward
    hidden_states = decoder_layer(
        hidden_states,
    ...<5 lines>...
        **kwargs,
    )
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/modeling_layers.py", line 93, in __call__
    return super().__call__(*args, **kwargs)
           ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/models/gpt_oss/modeling_gpt_oss.py", line 384, in forward
    hidden_states, _ = self.mlp(hidden_states)  # diff with llama: router scores
                       ~~~~~~~~^^^^^^^^^^^^^^^
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/integrations/mxfp4.py", line 508, in mlp_forward
    routed_out = self.experts(hidden_states, routing_data, gather_idx, scatter_idx=scatter_idx)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/integrations/mxfp4.py", line 411, in forward
    intermediate_cache3 = matmul_ogs(
        intermediate_cache1,
    ...<5 lines>...
        gammas=routing_data.gate_scal,
    )
  File "/home/nama/.cache/huggingface/hub/models--kernels-community--gpt-oss-triton-kernels/snapshots/76c23fb9a6607cd5c62c1e6b8e7f436ec5385517/build/torch-rocm/matmul_ogs.py", line 583, in matmul_ogs
    out = apply_postprocessing_features(scatter_indx, finalize_scatter_idxs, opt_flags, expt_token_offs_raw,
                                        num_indx, precision_config, routing_data,
                                        postprocessing_features, memory, fused_postprocess_activation, epilogue)
  File "/home/nama/.cache/huggingface/hub/models--kernels-community--gpt-oss-triton-kernels/snapshots/76c23fb9a6607cd5c62c1e6b8e7f436ec5385517/build/torch-rocm/matmul_ogs.py", line 252, in apply_postprocessing_features
    grid, (BLOCK_N, num_warps) = sorted([(compute_grid(*c), c) for c in candidates], key=lambda x: x[0][1])[0]
                                          ~~~~~~~~~~~~^^^^
  File "/home/nama/.cache/huggingface/hub/models--kernels-community--gpt-oss-triton-kernels/snapshots/76c23fb9a6607cd5c62c1e6b8e7f436ec5385517/build/torch-rocm/matmul_ogs.py", line 223, in compute_grid
    num_pid = target_info.num_sms() * (warps_per_sm // num_warps)
              ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'

---

$ pip list
Package           Version
----------------- --------------
accelerate        1.13.0
annotated-doc     0.0.4
anyio             4.13.0
certifi           2026.2.25
click             8.3.2
filelock          3.25.2
fsspec            2026.2.0
h11               0.16.0
hf-xet            1.4.3
httpcore          1.0.9
httpx             0.28.1
huggingface_hub   1.9.0
idna              3.11
Jinja2            3.1.6
kernels           0.12.3
markdown-it-py    4.0.0
MarkupSafe        3.0.3
mdurl             0.1.2
mpmath            1.3.0
networkx          3.6.1
numpy             2.4.3
packaging         26.0
pillow            12.1.1
pip               25.3
psutil            7.2.2
Pygments          2.20.0
PyYAML            6.0.3
regex             2026.4.4
rich              14.3.3
safetensors       0.7.0
setuptools        70.2.0
shellingham       1.5.4
sympy             1.14.0
tokenizers        0.22.2
torch             2.11.0+rocm7.2
torchvision       0.26.0+rocm7.2
tqdm              4.67.3
transformers      5.5.0
triton-rocm       3.6.0
typer             0.24.1
typing_extensions 4.15.0

---

$ pip list|grep triton
triton             3.5.1+rocm7.2.1.gita272dfa8
$ pip install -U https://download-r2.pytorch.org/whl/nightly/triton_rocm-3.6.0%2Bgit6213a0e8-cp312-cp312-linux_x86_64.whl
$ pip install -U https://download-r2.pytorch.org/whl/nightly/triton_rocm-3.7.0%2Bgit9c288bc5-cp312-cp312-linux_x86_64.whl

---

$ docker run -it --rm --network=host --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined rocm/pytorch:rocm7.2.1_ubuntu24.04_py3.12_pytorch_release_2.9.1
# pip install -U transformers kernels accelerate
# python
>>> from transformers import pipeline
>>> import torch
>>> model_id = "openai/gpt-oss-20b"
>>> pipe = pipeline(
...     "text-generation",
...     model=model_id,
...     torch_dtype="auto",
...     device_map="auto",
... )
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
config.json: 1.81kB [00:00, 1.49MB/s]
`torch_dtype` is deprecated! Use `dtype` instead!
model.safetensors.index.json: 36.4kB [00:00, 62.0MB/s]
Fetching 3 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [03:00<00:00, 60.12s/it]
Download complete: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 13.8G/13.8G [03:00<00:00, 76.3MB/s]
Fetching 42 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 42/42 [00:01<00:00, 33.86it/s]
Download complete: : 249kB [00:01, 193kB/s]              ████████████████████████████████████████████████████████████████████▌  | 41/42 [00:01<00:00, 40.35it/s]
Loading weights: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 411/411 [00:14<00:00, 29.26it/s]
generation_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 177/177 [00:00<00:00, 1.55MB/s]
tokenizer_config.json: 4.20kB [00:00, 9.92MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 27.9M/27.9M [00:01<00:00, 19.6MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 98.0/98.0 [00:00<00:00, 317kB/s]
chat_template.jinja: 16.7kB [00:00, 28.5MB/s]
>>> messages = [
...     {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
... ]
>>> outputs = pipe(
...     messages,
...     max_new_tokens=256,
... )
Passing `generation_config` together with generation-related arguments=({'max_new_tokens'}) is deprecated and will be removed in future versions. Please pass either a `generation_config` object OR all generation parameters explicitly, but not both.
Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/venv/lib/python3.12/site-packages/transformers/pipelines/text_generation.py", line 299, in __call__
    return super().__call__(text_inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/pipelines/base.py", line 1264, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/pipelines/base.py", line 1271, in run_single
    model_outputs = self.forward(model_inputs, **forward_params)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/pipelines/base.py", line 1163, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/pipelines/text_generation.py", line 403, in _forward
    output = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2543, in generate
    result = decoding_method(
             ^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2736, in _sample
    outputs = self._prefill(
              ^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 3768, in _prefill
    return self(**model_inputs, return_dict=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/utils/generic.py", line 876, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/models/gpt_oss/modeling_gpt_oss.py", line 649, in forward
    outputs: MoeModelOutputWithPast = self.model(
                                      ^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/utils/generic.py", line 952, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/utils/output_capturing.py", line 248, in wrapper
    outputs = func(self, *args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/models/gpt_oss/modeling_gpt_oss.py", line 490, in forward
    hidden_states = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/modeling_layers.py", line 93, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/models/gpt_oss/modeling_gpt_oss.py", line 384, in forward
    hidden_states, _ = self.mlp(hidden_states)  # diff with llama: router scores
                       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/integrations/mxfp4.py", line 508, in mlp_forward
    routed_out = self.experts(hidden_states, routing_data, gather_idx, scatter_idx=scatter_idx)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/integrations/mxfp4.py", line 411, in forward
    intermediate_cache3 = matmul_ogs(
                          ^^^^^^^^^^^
  File "/root/.cache/huggingface/hub/models--kernels-community--gpt-oss-triton-kernels/snapshots/76c23fb9a6607cd5c62c1e6b8e7f436ec5385517/build/torch-rocm/matmul_ogs.py", line 583, in matmul_ogs
    out = apply_postprocessing_features(scatter_indx, finalize_scatter_idxs, opt_flags, expt_token_offs_raw,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/hub/models--kernels-community--gpt-oss-triton-kernels/snapshots/76c23fb9a6607cd5c62c1e6b8e7f436ec5385517/build/torch-rocm/matmul_ogs.py", line 252, in apply_postprocessing_features
    grid, (BLOCK_N, num_warps) = sorted([(compute_grid(*c), c) for c in candidates], key=lambda x: x[0][1])[0]
                                          ^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/hub/models--kernels-community--gpt-oss-triton-kernels/snapshots/76c23fb9a6607cd5c62c1e6b8e7f436ec5385517/build/torch-rocm/matmul_ogs.py", line 223, in compute_grid
    num_pid = target_info.num_sms() * (warps_per_sm // num_warps)
              ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'

RAW_BUFFERClick to expand / collapse

System Info

GPT-OSS-20B does not work on Radeon GPUs. I tested it in both the native environment and the Docker container rocm/pytorch:rocm7.2.1_ubuntu24.04_py3.12_pytorch_release_2.9.1. I tried updating Triton, but it still didn't work. I tried those versions of Triton, triton-rocm 3.6.0, 3.5.1+rocm (included in rocm/pytorch), 3.6.0 nightly, and 3.7.0 nightly, but all resulted in errors.

@ivarflakstad

command log:

$ pip install -U transformers kernels accelerate
$ python
>>> from transformers import pipeline
>>> import torch
>>> model_id = "openai/gpt-oss-20b"
>>> pipe = pipeline(
...     "text-generation",
...     model=model_id,
...     torch_dtype="auto",
...     device_map="auto",
... )
`torch_dtype` is deprecated! Use `dtype` instead!
Fetching 42 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 42/42 [00:02<00:00, 16.72it/s]
Download complete: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 101/101 [00:02<00:00, 39.0B/s]
Loading weights: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 411/411 [01:16<00:00,  5.40it/s]
>>> messages = [
...     {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
... ]
>>> outputs = pipe(
...     messages,
...     max_new_tokens=256,
... )
Passing `generation_config` together with generation-related arguments=({'max_new_tokens'}) is deprecated and will be removed in future versions. Please pass either a `generation_config` object OR all generation parameters explicitly, but not both.
Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Traceback (most recent call last):
  File "<python-input-5>", line 1, in <module>
    outputs = pipe(
        messages,
        max_new_tokens=256,
    )
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/pipelines/text_generation.py", line 299, in __call__
    return super().__call__(text_inputs, **kwargs)
           ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/pipelines/base.py", line 1264, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/pipelines/base.py", line 1271, in run_single
    model_outputs = self.forward(model_inputs, **forward_params)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/pipelines/base.py", line 1163, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/pipelines/text_generation.py", line 403, in _forward
    output = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/generation/utils.py", line 2543, in generate
    result = decoding_method(
        self,
    ...<5 lines>...
        **model_kwargs,
    )
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/generation/utils.py", line 2736, in _sample
    outputs = self._prefill(
        input_ids,
    ...<2 lines>...
        is_first_iteration=not generation_config.is_assistant,
    )
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/generation/utils.py", line 3768, in _prefill
    return self(**model_inputs, return_dict=True)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/utils/generic.py", line 876, in wrapper
    output = func(self, *args, **kwargs)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/models/gpt_oss/modeling_gpt_oss.py", line 649, in forward
    outputs: MoeModelOutputWithPast = self.model(
                                      ~~~~~~~~~~^
        input_ids=input_ids,
        ^^^^^^^^^^^^^^^^^^^^
    ...<6 lines>...
        **kwargs,
        ^^^^^^^^^
    )
    ^
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/utils/generic.py", line 952, in wrapper
    output = func(self, *args, **kwargs)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/utils/output_capturing.py", line 248, in wrapper
    outputs = func(self, *args, **kwargs)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/models/gpt_oss/modeling_gpt_oss.py", line 490, in forward
    hidden_states = decoder_layer(
        hidden_states,
    ...<5 lines>...
        **kwargs,
    )
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/modeling_layers.py", line 93, in __call__
    return super().__call__(*args, **kwargs)
           ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/models/gpt_oss/modeling_gpt_oss.py", line 384, in forward
    hidden_states, _ = self.mlp(hidden_states)  # diff with llama: router scores
                       ~~~~~~~~^^^^^^^^^^^^^^^
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/integrations/mxfp4.py", line 508, in mlp_forward
    routed_out = self.experts(hidden_states, routing_data, gather_idx, scatter_idx=scatter_idx)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/nama/.pyenv/versions/rocm/lib/python3.13/site-packages/transformers/integrations/mxfp4.py", line 411, in forward
    intermediate_cache3 = matmul_ogs(
        intermediate_cache1,
    ...<5 lines>...
        gammas=routing_data.gate_scal,
    )
  File "/home/nama/.cache/huggingface/hub/models--kernels-community--gpt-oss-triton-kernels/snapshots/76c23fb9a6607cd5c62c1e6b8e7f436ec5385517/build/torch-rocm/matmul_ogs.py", line 583, in matmul_ogs
    out = apply_postprocessing_features(scatter_indx, finalize_scatter_idxs, opt_flags, expt_token_offs_raw,
                                        num_indx, precision_config, routing_data,
                                        postprocessing_features, memory, fused_postprocess_activation, epilogue)
  File "/home/nama/.cache/huggingface/hub/models--kernels-community--gpt-oss-triton-kernels/snapshots/76c23fb9a6607cd5c62c1e6b8e7f436ec5385517/build/torch-rocm/matmul_ogs.py", line 252, in apply_postprocessing_features
    grid, (BLOCK_N, num_warps) = sorted([(compute_grid(*c), c) for c in candidates], key=lambda x: x[0][1])[0]
                                          ~~~~~~~~~~~~^^^^
  File "/home/nama/.cache/huggingface/hub/models--kernels-community--gpt-oss-triton-kernels/snapshots/76c23fb9a6607cd5c62c1e6b8e7f436ec5385517/build/torch-rocm/matmul_ogs.py", line 223, in compute_grid
    num_pid = target_info.num_sms() * (warps_per_sm // num_warps)
              ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'

environment:

$ pip list
Package           Version
----------------- --------------
accelerate        1.13.0
annotated-doc     0.0.4
anyio             4.13.0
certifi           2026.2.25
click             8.3.2
filelock          3.25.2
fsspec            2026.2.0
h11               0.16.0
hf-xet            1.4.3
httpcore          1.0.9
httpx             0.28.1
huggingface_hub   1.9.0
idna              3.11
Jinja2            3.1.6
kernels           0.12.3
markdown-it-py    4.0.0
MarkupSafe        3.0.3
mdurl             0.1.2
mpmath            1.3.0
networkx          3.6.1
numpy             2.4.3
packaging         26.0
pillow            12.1.1
pip               25.3
psutil            7.2.2
Pygments          2.20.0
PyYAML            6.0.3
regex             2026.4.4
rich              14.3.3
safetensors       0.7.0
setuptools        70.2.0
shellingham       1.5.4
sympy             1.14.0
tokenizers        0.22.2
torch             2.11.0+rocm7.2
torchvision       0.26.0+rocm7.2
tqdm              4.67.3
transformers      5.5.0
triton-rocm       3.6.0
typer             0.24.1
typing_extensions 4.15.0

Tested triton version:

$ pip list|grep triton
triton             3.5.1+rocm7.2.1.gita272dfa8
$ pip install -U https://download-r2.pytorch.org/whl/nightly/triton_rocm-3.6.0%2Bgit6213a0e8-cp312-cp312-linux_x86_64.whl
$ pip install -U https://download-r2.pytorch.org/whl/nightly/triton_rocm-3.7.0%2Bgit9c288bc5-cp312-cp312-linux_x86_64.whl

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

$ docker run -it --rm --network=host --device=/dev/kfd --device=/dev/dri --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined rocm/pytorch:rocm7.2.1_ubuntu24.04_py3.12_pytorch_release_2.9.1
# pip install -U transformers kernels accelerate
# python
>>> from transformers import pipeline
>>> import torch
>>> model_id = "openai/gpt-oss-20b"
>>> pipe = pipeline(
...     "text-generation",
...     model=model_id,
...     torch_dtype="auto",
...     device_map="auto",
... )
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
config.json: 1.81kB [00:00, 1.49MB/s]
`torch_dtype` is deprecated! Use `dtype` instead!
model.safetensors.index.json: 36.4kB [00:00, 62.0MB/s]
Fetching 3 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [03:00<00:00, 60.12s/it]
Download complete: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████| 13.8G/13.8G [03:00<00:00, 76.3MB/s]
Fetching 42 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 42/42 [00:01<00:00, 33.86it/s]
Download complete: : 249kB [00:01, 193kB/s]              ████████████████████████████████████████████████████████████████████▌  | 41/42 [00:01<00:00, 40.35it/s]
Loading weights: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 411/411 [00:14<00:00, 29.26it/s]
generation_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 177/177 [00:00<00:00, 1.55MB/s]
tokenizer_config.json: 4.20kB [00:00, 9.92MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 27.9M/27.9M [00:01<00:00, 19.6MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 98.0/98.0 [00:00<00:00, 317kB/s]
chat_template.jinja: 16.7kB [00:00, 28.5MB/s]
>>> messages = [
...     {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
... ]
>>> outputs = pipe(
...     messages,
...     max_new_tokens=256,
... )
Passing `generation_config` together with generation-related arguments=({'max_new_tokens'}) is deprecated and will be removed in future versions. Please pass either a `generation_config` object OR all generation parameters explicitly, but not both.
Both `max_new_tokens` (=256) and `max_length`(=20) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/venv/lib/python3.12/site-packages/transformers/pipelines/text_generation.py", line 299, in __call__
    return super().__call__(text_inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/pipelines/base.py", line 1264, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/pipelines/base.py", line 1271, in run_single
    model_outputs = self.forward(model_inputs, **forward_params)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/pipelines/base.py", line 1163, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/pipelines/text_generation.py", line 403, in _forward
    output = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2543, in generate
    result = decoding_method(
             ^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2736, in _sample
    outputs = self._prefill(
              ^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 3768, in _prefill
    return self(**model_inputs, return_dict=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/utils/generic.py", line 876, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/models/gpt_oss/modeling_gpt_oss.py", line 649, in forward
    outputs: MoeModelOutputWithPast = self.model(
                                      ^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/utils/generic.py", line 952, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/utils/output_capturing.py", line 248, in wrapper
    outputs = func(self, *args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/models/gpt_oss/modeling_gpt_oss.py", line 490, in forward
    hidden_states = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/modeling_layers.py", line 93, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/models/gpt_oss/modeling_gpt_oss.py", line 384, in forward
    hidden_states, _ = self.mlp(hidden_states)  # diff with llama: router scores
                       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/integrations/mxfp4.py", line 508, in mlp_forward
    routed_out = self.experts(hidden_states, routing_data, gather_idx, scatter_idx=scatter_idx)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/integrations/mxfp4.py", line 411, in forward
    intermediate_cache3 = matmul_ogs(
                          ^^^^^^^^^^^
  File "/root/.cache/huggingface/hub/models--kernels-community--gpt-oss-triton-kernels/snapshots/76c23fb9a6607cd5c62c1e6b8e7f436ec5385517/build/torch-rocm/matmul_ogs.py", line 583, in matmul_ogs
    out = apply_postprocessing_features(scatter_indx, finalize_scatter_idxs, opt_flags, expt_token_offs_raw,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/hub/models--kernels-community--gpt-oss-triton-kernels/snapshots/76c23fb9a6607cd5c62c1e6b8e7f436ec5385517/build/torch-rocm/matmul_ogs.py", line 252, in apply_postprocessing_features
    grid, (BLOCK_N, num_warps) = sorted([(compute_grid(*c), c) for c in candidates], key=lambda x: x[0][1])[0]
                                          ^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/hub/models--kernels-community--gpt-oss-triton-kernels/snapshots/76c23fb9a6607cd5c62c1e6b8e7f436ec5385517/build/torch-rocm/matmul_ogs.py", line 223, in compute_grid
    num_pid = target_info.num_sms() * (warps_per_sm // num_warps)
              ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'

Expected behavior

Execute without errors

extent analysis

TL;DR

The issue is likely due to a compatibility problem between the GPT-OSS-20B model and the Radeon GPU, and a potential fix is to use a different GPU or update the Triton version.

Guidance

The error message TypeError: unsupported operand type(s) for *: 'NoneType' and 'int' suggests that there is a compatibility issue between the model and the GPU.
The fact that the issue occurs with different versions of Triton (3.5.1+rocm, 3.6.0, and 3.7.0 nightly) suggests that the problem may not be specific to a particular version of Triton.
To troubleshoot the issue, try running the model on a different GPU, such as an NVIDIA GPU, to see if the issue persists.
If the issue is specific to the Radeon GPU, it may be necessary to update the GPU drivers or firmware to ensure compatibility with the model.

Example

No code example is provided as the issue is likely related to hardware compatibility rather than a specific code snippet.

Notes

The issue may be specific to the combination of the GPT-OSS-20B model and the Radeon GPU, and may not be reproducible on other hardware configurations.

Recommendation

Apply a workaround by using a different GPU, such as an NVIDIA GPU, to run the model. If this is not possible, try updating the GPU drivers or firmware to ensure compatibility with the model.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Execute without errors

#autograd error #model save/load #optimization #mixed precision #training loop

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - 💡(How to fix) Fix GPT-OSS-20B not work in AMD GPUs [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

transformers - 💡(How to fix) Fix GPT-OSS-20B not work in AMD GPUs [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING