transformers - ✅(Solved) Fix Mllama compile failed after new attn mask [2 pull requests, 3 comments, 2 participants]

transformers2026-03-05 07:33:29

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#44458•Fetched 2026-04-08 00:28:21

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jiqing-feng

Participants

jiqing-feng

michaelanderson01826-glitch

Timeline (top)

mentioned ×4subscribed ×4commented ×3cross-referenced ×3

Error Message

Loading weights: 100%|█| 906/906 [00:00<00:00, 1756.16it/s, Materializing param=model.vision_model.transformer.layers.31.self_attn.v_ The image processor of type MllamaImageProcessor is now loaded as a fast processor by default, even if the model checkpoint was sav ed with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with use_fast=False. W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] torch._dynamo hit config.recompile_limit (8) W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] function: 'call' (/home/jiqing/transformers/src/trans formers/modeling_layers.py:59) W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] last reason: 8/3: ___check_type_id(self, 787516432), type =<class 'transformers.models.mllama.modeling_mllama.MllamaCrossAttentionDecoderLayer'> # if self.gradient_checkpointing and self.tra ining: # ome/jiqing/transformers/src/transformers/modeling_layers.py:60 in call (HINT: type MllamaCrossAttentionDecoderLayer) W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] User stack trace: W0305 07:24:38.959000 1936280 torch/dynamo/convert_frame.py:1767] [8/8] File "/home/jiqing/transformers/src/transformers/modeling layers.py", line 60, in call W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] if self.gradient_checkpointing and self.training: W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] To log all recompilation reasons, use TORCH_LOGS="recompiles ". W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] To diagnose recompilation issues, see https://docs.pytorch.o rg/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html <|begin_of_text|><|start_header_id|>user<|end_header_id|>

A rabbit in a coat Stands on a dirt path, smiling Springtime's gentle charm<|eot_id|>

Fix Action

Fixed

Fixed by PR: Fix Mllama torch.compile failure caused by new attention mask logic (https://github.com/huggingface/transformers/pull/44845)
Fixed by PR: [Mllama] Fix workaround compile (https://github.com/huggingface/transformers/pull/44850)

PR fix notes

PR #44845: Fix Mllama torch.compile failure caused by new attention mask logic

Repository: huggingface/transformers
Author: jiqing-feng
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/44845

Description (problem / solution / changelog)

What does this PR do?

Fixes torch.compile failure for Mllama after #42848 introduced a new unified attention mask creation path.

The root cause is a torch inductor C++ codegen bug: when padding_mask_function uses advanced tensor indexing (padding_mask[batch_idx, kv_idx]), the generated C++ boundary-check code references an undeclared variable (tmp2), causing g++ compilation to fail with CppCompileError.

This PR applies two changes:

masking_utils.py: In the non-vmap sdpa_mask path, apply the padding mask separately using slice-based indexing (padding_mask[:, kv_offset : kv_offset + kv_length]) instead of merging it into the mask_function with advanced tensor indexing. This avoids the inductor codegen bug while producing identical results.
modeling_mllama.py: Replace torch.arange-based fancy indexing with simple slice indexing when extracting cross_attention_mask and full_text_row_masked_out_mask for the current sequence position. This is semantically equivalent but more torch.compile-friendly.

Fixes #44458

Changed files

src/transformers/masking_utils.py (modified, +11/-1)
src/transformers/models/mllama/modeling_mllama.py (modified, +4/-4)

PR #44850: [`Mllama`] Fix workaround compile

Repository: huggingface/transformers
Author: vasqu
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/44850

Description (problem / solution / changelog)

See #44458

This is a deep issue tbh - the cross attentions are reshaped into a different shape than the text input leading to a mismatch between batch sizes. This only gets noticed during compile as it is more strict about the concrete shapes and indices. Tested locally that it works.

Changed files

src/transformers/models/mllama/modeling_mllama.py (modified, +5/-6)

Code Example

import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor

model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"

model = MllamaForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="cpu",
)
processor = AutoProcessor.from_pretrained(model_id)

# apply torch compile
model.forward = torch.compile(model.forward)

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
image = Image.open(requests.get(url, stream=True).raw)

messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
    ]}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(
    image,
    input_text,
    add_special_tokens=False,
    return_tensors="pt"
).to(model.device)

output = model.generate(**inputs, max_new_tokens=30)
print(processor.decode(output[0]))

---

Loading weights: 100%|█| 906/906 [00:00<00:00, 1756.16it/s, Materializing param=model.vision_model.transformer.layers.31.self_attn.v_
The image processor of type `MllamaImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was sav
ed with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor,
 instantiate this class with `use_fast=False`.
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] torch._dynamo hit config.recompile_limit (8)
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]    function: '__call__' (/home/jiqing/transformers/src/trans
formers/modeling_layers.py:59)
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]    last reason: 8/3: ___check_type_id(self, 787516432), type
=<class 'transformers.models.mllama.modeling_mllama.MllamaCrossAttentionDecoderLayer'>  # if self.gradient_checkpointing and self.tra
ining:  # ome/jiqing/transformers/src/transformers/modeling_layers.py:60 in __call__ (HINT: type MllamaCrossAttentionDecoderLayer)
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] User stack trace:
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]   File "/home/jiqing/transformers/src/transformers/modeling_
layers.py", line 60, in __call__
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]     if self.gradient_checkpointing and self.training:
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] To log all recompilation reasons, use TORCH_LOGS="recompiles
".
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] To diagnose recompilation issues, see https://docs.pytorch.o
rg/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html                                                  <|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>If I had to write a haiku for this one, it would be: <|eot_id|><|start_header_id|>assistant<|end_header_id|>
                                                                                                                                     Here is a haiku for the image:

A rabbit in a coat
Stands on a dirt path, smiling
Springtime's gentle charm<|eot_id|>

---

W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8] torch._dynamo hit config.recompile_limit (8)
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8]    function: '__call__' (/home/jiqing/transformers/src/trans
formers/modeling_layers.py:59)
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8]    last reason: 8/3: ___check_type_id(self, 1250368528), typ
e=<class 'transformers.models.mllama.modeling_mllama.MllamaCrossAttentionDecoderLayer'>  # if self.gradient_checkpointing and self.tr
aining:  # ome/jiqing/transformers/src/transformers/modeling_layers.py:60 in __call__
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8] To log all recompilation reasons, use TORCH_LOGS="recompiles
".
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8] To diagnose recompilation issues, see https://pytorch.org/do
cs/main/compile/programming_model.recompilation.html
Traceback (most recent call last):
  File "/home/jiqing/test_llama_vision.py", line 35, in <module>
    output = model.generate(**inputs, max_new_tokens=30)                                                                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiqing/transformers/src/transformers/generation/utils.py", line 2555, in generate
    result = decoding_method(
             ^^^^^^^^^^^^^^^^
  File "/home/jiqing/transformers/src/transformers/generation/utils.py", line 2762, in _sample
    outputs = model_forward(**model_inputs, return_dict=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 967, in compile_wrapper
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1766, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
    compiled_module = graph.compile_to_module()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2416, in compile_to_module
    return self._compile_to_module()
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2426, in _compile_to_module
    mod = self._compile_to_module_lines(wrapper_code)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2501, in _compile_to_module_lines
.......
......
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 2966, in _worker_compile_cpp
    builder.build()
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 2144, in build
    run_compile_cmd(build_cmd, cwd=_build_tmp_dir)
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 636, in run_compile_cmd
    _run_compile_cmd(cmd_line, cwd)
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 631, in _run_compile_cmd
    raise exc.CppCompileError(cmd, output) from e
torch._inductor.exc.InductorError: CppCompileError: C++ compile error

Command:
g++ /tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp -D TORCH_INDUCTOR_CPP_WRAPPER -D STANDAL
ONE_TORCH_HEADER -D C10_USING_CUSTOM_GENERATED_MACROS -D CPU_CAPABILITY_AVX512 -O3 -DNDEBUG -fno-trapping-math -funsafe-math-optimiza
tions -ffinite-math-only -fno-signed-zeros -fno-math-errno -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -fe
xcess-precision=fast -fno-tree-loop-vectorize -march=native -shared -fPIC -Wall -std=c++17 -Wno-unused-variable -Wno-unknown-pragmas
-pedantic -fopenmp -include /tmp/torchinductor_root/precompiled_headers/cimseuvkhk6u5tg72hhkbkur6zutyynuqzqik7rx7nziiylz223c.h -I/roo
t/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/include/python3.12 -I/opt/.venv/lib/python3.12/site-packages/torch/include
-I/opt/.venv/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -mavx512f -mavx512dq -mavx512vl -mavx512bw -mfma -mavx
512vnni -mavx512vl -mamx-tile -mamx-bf16 -mamx-int8 -mavx512bf16 -mamx-fp16 -o /tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6x
bm67j4hobbgxub2kjtsoqojma.main.so -ltorch -ltorch_cpu -ltorch_python -lgomp -L/root/.local/share/uv/python/cpython-3.12.12-linux-x86_
64-gnu/lib -L/opt/.venv/lib/python3.12/site-packages/torch/lib

...
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp: In function ‘void kernel(const int64_t*, bo
ol*, int64_t, int64_t)’:
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp:14:62: error: ‘tmp2’ was not declared in thi
s scope; did you mean ‘tm’?
   14 |                     TORCH_CHECK((at::vec::VecMask<int64_t,2>(tmp2 < at::vec::VectorizedN<int64_t,2>(ks1))).all_masked(), "ind
ex out of bounds: tmp2 < ks1");
      |                                                              ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/torch/headeronly/macros/Macros.h:202:64: note: in definition of macro ‘C10_UNLI
KELY’
  202 | #define C10_UNLIKELY(expr) (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/c10/util/Exception.h:566:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  566 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {       \
      |       ^~~~~~~~~~~~~~~~~~~~~
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp:14:21: note: in expansion of macro ‘TORCH_CH
ECK’
   14 |                     TORCH_CHECK((at::vec::VecMask<int64_t,2>(tmp2 < at::vec::VectorizedN<int64_t,2>(ks1))).all_masked(), "ind
ex out of bounds: tmp2 < ks1");
      |                     ^~~~~~~~~~~
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp:36:134: error: ‘tmp2’ was not declared in th
is scope; did you mean ‘tm’?
...
...
/opt/.venv/lib/python3.12/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘
int’ to ‘char’ changes value from ‘128’ to ‘-128’ [-Woverflow]
 1866 |       0x80,
      |       ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘
int’ to ‘char’ changes value from ‘128’ to ‘-128’ [-Woverflow]
 1868 |       0x80,
      |       ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘
int’ to ‘char’ changes value from ‘128’ to ‘-128’ [-Woverflow]
 1870 |       0x80,
      |       ^~~~
...
...

RAW_BUFFERClick to expand / collapse

System Info

torch 2.10.0+cpu

regression PR: #42848

Who can help?

@vasqu

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor

model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"

model = MllamaForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="cpu",
)
processor = AutoProcessor.from_pretrained(model_id)

# apply torch compile
model.forward = torch.compile(model.forward)

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
image = Image.open(requests.get(url, stream=True).raw)

messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
    ]}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(
    image,
    input_text,
    add_special_tokens=False,
    return_tensors="pt"
).to(model.device)

output = model.generate(**inputs, max_new_tokens=30)
print(processor.decode(output[0]))

Expected behavior

output before regression:

Loading weights: 100%|█| 906/906 [00:00<00:00, 1756.16it/s, Materializing param=model.vision_model.transformer.layers.31.self_attn.v_
The image processor of type `MllamaImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was sav
ed with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor,
 instantiate this class with `use_fast=False`.
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] torch._dynamo hit config.recompile_limit (8)
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]    function: '__call__' (/home/jiqing/transformers/src/trans
formers/modeling_layers.py:59)
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]    last reason: 8/3: ___check_type_id(self, 787516432), type
=<class 'transformers.models.mllama.modeling_mllama.MllamaCrossAttentionDecoderLayer'>  # if self.gradient_checkpointing and self.tra
ining:  # ome/jiqing/transformers/src/transformers/modeling_layers.py:60 in __call__ (HINT: type MllamaCrossAttentionDecoderLayer)
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] User stack trace:
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]   File "/home/jiqing/transformers/src/transformers/modeling_
layers.py", line 60, in __call__
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]     if self.gradient_checkpointing and self.training:
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] To log all recompilation reasons, use TORCH_LOGS="recompiles
".
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] To diagnose recompilation issues, see https://docs.pytorch.o
rg/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html                                                  <|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>If I had to write a haiku for this one, it would be: <|eot_id|><|start_header_id|>assistant<|end_header_id|>
                                                                                                                                     Here is a haiku for the image:

A rabbit in a coat
Stands on a dirt path, smiling
Springtime's gentle charm<|eot_id|>

output after regression:

W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8] torch._dynamo hit config.recompile_limit (8)
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8]    function: '__call__' (/home/jiqing/transformers/src/trans
formers/modeling_layers.py:59)
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8]    last reason: 8/3: ___check_type_id(self, 1250368528), typ
e=<class 'transformers.models.mllama.modeling_mllama.MllamaCrossAttentionDecoderLayer'>  # if self.gradient_checkpointing and self.tr
aining:  # ome/jiqing/transformers/src/transformers/modeling_layers.py:60 in __call__
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8] To log all recompilation reasons, use TORCH_LOGS="recompiles
".
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8] To diagnose recompilation issues, see https://pytorch.org/do
cs/main/compile/programming_model.recompilation.html
Traceback (most recent call last):
  File "/home/jiqing/test_llama_vision.py", line 35, in <module>
    output = model.generate(**inputs, max_new_tokens=30)                                                                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiqing/transformers/src/transformers/generation/utils.py", line 2555, in generate
    result = decoding_method(
             ^^^^^^^^^^^^^^^^
  File "/home/jiqing/transformers/src/transformers/generation/utils.py", line 2762, in _sample
    outputs = model_forward(**model_inputs, return_dict=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 967, in compile_wrapper
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1766, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
    compiled_module = graph.compile_to_module()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2416, in compile_to_module
    return self._compile_to_module()
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2426, in _compile_to_module
    mod = self._compile_to_module_lines(wrapper_code)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2501, in _compile_to_module_lines
.......
......
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 2966, in _worker_compile_cpp
    builder.build()
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 2144, in build
    run_compile_cmd(build_cmd, cwd=_build_tmp_dir)
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 636, in run_compile_cmd
    _run_compile_cmd(cmd_line, cwd)
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 631, in _run_compile_cmd
    raise exc.CppCompileError(cmd, output) from e
torch._inductor.exc.InductorError: CppCompileError: C++ compile error

Command:
g++ /tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp -D TORCH_INDUCTOR_CPP_WRAPPER -D STANDAL
ONE_TORCH_HEADER -D C10_USING_CUSTOM_GENERATED_MACROS -D CPU_CAPABILITY_AVX512 -O3 -DNDEBUG -fno-trapping-math -funsafe-math-optimiza
tions -ffinite-math-only -fno-signed-zeros -fno-math-errno -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -fe
xcess-precision=fast -fno-tree-loop-vectorize -march=native -shared -fPIC -Wall -std=c++17 -Wno-unused-variable -Wno-unknown-pragmas
-pedantic -fopenmp -include /tmp/torchinductor_root/precompiled_headers/cimseuvkhk6u5tg72hhkbkur6zutyynuqzqik7rx7nziiylz223c.h -I/roo
t/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/include/python3.12 -I/opt/.venv/lib/python3.12/site-packages/torch/include
-I/opt/.venv/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -mavx512f -mavx512dq -mavx512vl -mavx512bw -mfma -mavx
512vnni -mavx512vl -mamx-tile -mamx-bf16 -mamx-int8 -mavx512bf16 -mamx-fp16 -o /tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6x
bm67j4hobbgxub2kjtsoqojma.main.so -ltorch -ltorch_cpu -ltorch_python -lgomp -L/root/.local/share/uv/python/cpython-3.12.12-linux-x86_
64-gnu/lib -L/opt/.venv/lib/python3.12/site-packages/torch/lib

...
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp: In function ‘void kernel(const int64_t*, bo
ol*, int64_t, int64_t)’:
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp:14:62: error: ‘tmp2’ was not declared in thi
s scope; did you mean ‘tm’?
   14 |                     TORCH_CHECK((at::vec::VecMask<int64_t,2>(tmp2 < at::vec::VectorizedN<int64_t,2>(ks1))).all_masked(), "ind
ex out of bounds: tmp2 < ks1");
      |                                                              ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/torch/headeronly/macros/Macros.h:202:64: note: in definition of macro ‘C10_UNLI
KELY’
  202 | #define C10_UNLIKELY(expr) (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/c10/util/Exception.h:566:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  566 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {       \
      |       ^~~~~~~~~~~~~~~~~~~~~
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp:14:21: note: in expansion of macro ‘TORCH_CH
ECK’
   14 |                     TORCH_CHECK((at::vec::VecMask<int64_t,2>(tmp2 < at::vec::VectorizedN<int64_t,2>(ks1))).all_masked(), "ind
ex out of bounds: tmp2 < ks1");
      |                     ^~~~~~~~~~~
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp:36:134: error: ‘tmp2’ was not declared in th
is scope; did you mean ‘tm’?
...
...
/opt/.venv/lib/python3.12/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘
int’ to ‘char’ changes value from ‘128’ to ‘-128’ [-Woverflow]
 1866 |       0x80,
      |       ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘
int’ to ‘char’ changes value from ‘128’ to ‘-128’ [-Woverflow]
 1868 |       0x80,
      |       ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘
int’ to ‘char’ changes value from ‘128’ to ‘-128’ [-Woverflow]
 1870 |       0x80,
      |       ^~~~
...
...

extent analysis

Fix Plan

1. Downgrade torch to version 1.12.1

pip install torch==1.12.1

2. Remove the line that applies torch compile

# model.forward = torch.compile(model.forward)

3. Run the script again

python test_llama_vision.py

Verification

Run the script with the modified code and check if it produces the expected output.
Check the console output for any errors or warnings.

Extra Tips

Make sure to check the version of torch and transformers before running the script.
If you are using a newer version of torch, you may need to update the code to be compatible with the new version.
If you are using a custom model or dataset, you may need to modify the code to be compatible with the new version of torch.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

output before regression:

Loading weights: 100%|█| 906/906 [00:00<00:00, 1756.16it/s, Materializing param=model.vision_model.transformer.layers.31.self_attn.v_
The image processor of type `MllamaImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was sav
ed with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor,
 instantiate this class with `use_fast=False`.
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] torch._dynamo hit config.recompile_limit (8)
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]    function: '__call__' (/home/jiqing/transformers/src/trans
formers/modeling_layers.py:59)
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]    last reason: 8/3: ___check_type_id(self, 787516432), type
=<class 'transformers.models.mllama.modeling_mllama.MllamaCrossAttentionDecoderLayer'>  # if self.gradient_checkpointing and self.tra
ining:  # ome/jiqing/transformers/src/transformers/modeling_layers.py:60 in __call__ (HINT: type MllamaCrossAttentionDecoderLayer)
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] User stack trace:
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]   File "/home/jiqing/transformers/src/transformers/modeling_
layers.py", line 60, in __call__
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]     if self.gradient_checkpointing and self.training:
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] To log all recompilation reasons, use TORCH_LOGS="recompiles
".
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] To diagnose recompilation issues, see https://docs.pytorch.o
rg/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html                                                  <|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>If I had to write a haiku for this one, it would be: <|eot_id|><|start_header_id|>assistant<|end_header_id|>
                                                                                                                                     Here is a haiku for the image:

A rabbit in a coat
Stands on a dirt path, smiling
Springtime's gentle charm<|eot_id|>

output after regression:

W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8] torch._dynamo hit config.recompile_limit (8)
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8]    function: '__call__' (/home/jiqing/transformers/src/trans
formers/modeling_layers.py:59)
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8]    last reason: 8/3: ___check_type_id(self, 1250368528), typ
e=<class 'transformers.models.mllama.modeling_mllama.MllamaCrossAttentionDecoderLayer'>  # if self.gradient_checkpointing and self.tr
aining:  # ome/jiqing/transformers/src/transformers/modeling_layers.py:60 in __call__
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8] To log all recompilation reasons, use TORCH_LOGS="recompiles
".
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8] To diagnose recompilation issues, see https://pytorch.org/do
cs/main/compile/programming_model.recompilation.html
Traceback (most recent call last):
  File "/home/jiqing/test_llama_vision.py", line 35, in <module>
    output = model.generate(**inputs, max_new_tokens=30)                                                                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiqing/transformers/src/transformers/generation/utils.py", line 2555, in generate
    result = decoding_method(
             ^^^^^^^^^^^^^^^^
  File "/home/jiqing/transformers/src/transformers/generation/utils.py", line 2762, in _sample
    outputs = model_forward(**model_inputs, return_dict=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 967, in compile_wrapper
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1766, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
    compiled_module = graph.compile_to_module()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2416, in compile_to_module
    return self._compile_to_module()
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2426, in _compile_to_module
    mod = self._compile_to_module_lines(wrapper_code)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2501, in _compile_to_module_lines
.......
......
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 2966, in _worker_compile_cpp
    builder.build()
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 2144, in build
    run_compile_cmd(build_cmd, cwd=_build_tmp_dir)
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 636, in run_compile_cmd
    _run_compile_cmd(cmd_line, cwd)
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 631, in _run_compile_cmd
    raise exc.CppCompileError(cmd, output) from e
torch._inductor.exc.InductorError: CppCompileError: C++ compile error

Command:
g++ /tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp -D TORCH_INDUCTOR_CPP_WRAPPER -D STANDAL
ONE_TORCH_HEADER -D C10_USING_CUSTOM_GENERATED_MACROS -D CPU_CAPABILITY_AVX512 -O3 -DNDEBUG -fno-trapping-math -funsafe-math-optimiza
tions -ffinite-math-only -fno-signed-zeros -fno-math-errno -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -fe
xcess-precision=fast -fno-tree-loop-vectorize -march=native -shared -fPIC -Wall -std=c++17 -Wno-unused-variable -Wno-unknown-pragmas
-pedantic -fopenmp -include /tmp/torchinductor_root/precompiled_headers/cimseuvkhk6u5tg72hhkbkur6zutyynuqzqik7rx7nziiylz223c.h -I/roo
t/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/include/python3.12 -I/opt/.venv/lib/python3.12/site-packages/torch/include
-I/opt/.venv/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -mavx512f -mavx512dq -mavx512vl -mavx512bw -mfma -mavx
512vnni -mavx512vl -mamx-tile -mamx-bf16 -mamx-int8 -mavx512bf16 -mamx-fp16 -o /tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6x
bm67j4hobbgxub2kjtsoqojma.main.so -ltorch -ltorch_cpu -ltorch_python -lgomp -L/root/.local/share/uv/python/cpython-3.12.12-linux-x86_
64-gnu/lib -L/opt/.venv/lib/python3.12/site-packages/torch/lib

...
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp: In function ‘void kernel(const int64_t*, bo
ol*, int64_t, int64_t)’:
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp:14:62: error: ‘tmp2’ was not declared in thi
s scope; did you mean ‘tm’?
   14 |                     TORCH_CHECK((at::vec::VecMask<int64_t,2>(tmp2 < at::vec::VectorizedN<int64_t,2>(ks1))).all_masked(), "ind
ex out of bounds: tmp2 < ks1");
      |                                                              ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/torch/headeronly/macros/Macros.h:202:64: note: in definition of macro ‘C10_UNLI
KELY’
  202 | #define C10_UNLIKELY(expr) (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/c10/util/Exception.h:566:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  566 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {       \
      |       ^~~~~~~~~~~~~~~~~~~~~
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp:14:21: note: in expansion of macro ‘TORCH_CH
ECK’
   14 |                     TORCH_CHECK((at::vec::VecMask<int64_t,2>(tmp2 < at::vec::VectorizedN<int64_t,2>(ks1))).all_masked(), "ind
ex out of bounds: tmp2 < ks1");
      |                     ^~~~~~~~~~~
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp:36:134: error: ‘tmp2’ was not declared in th
is scope; did you mean ‘tm’?
...
...
/opt/.venv/lib/python3.12/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘
int’ to ‘char’ changes value from ‘128’ to ‘-128’ [-Woverflow]
 1866 |       0x80,
      |       ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘
int’ to ‘char’ changes value from ‘128’ to ‘-128’ [-Woverflow]
 1868 |       0x80,
      |       ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘
int’ to ‘char’ changes value from ‘128’ to ‘-128’ [-Woverflow]
 1870 |       0x80,
      |       ^~~~
...
...

#api #ssr #installation #tensor shape #optimization #retriever error #indexing error #inference speed

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

transformers - ✅(Solved) Fix Mllama compile failed after new attn mask [2 pull requests, 3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #44845: Fix Mllama torch.compile failure caused by new attention mask logic

Description (problem / solution / changelog)

What does this PR do?

Changed files

PR #44850: [Mllama] Fix workaround compile

Description (problem / solution / changelog)

Changed files

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

1. Downgrade torch to version 1.12.1

2. Remove the line that applies torch compile

3. Run the script again

Verification

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING

PR #44850: [`Mllama`] Fix workaround compile