transformers - ✅(Solved) Fix Mllama compile failed after new attn mask [2 pull requests, 3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#44458Fetched 2026-04-08 00:28:21
View on GitHub
Comments
3
Participants
2
Timeline
15
Reactions
0
Timeline (top)
mentioned ×4subscribed ×4commented ×3cross-referenced ×3

Error Message

Loading weights: 100%|█| 906/906 [00:00<00:00, 1756.16it/s, Materializing param=model.vision_model.transformer.layers.31.self_attn.v_ The image processor of type MllamaImageProcessor is now loaded as a fast processor by default, even if the model checkpoint was sav ed with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor, instantiate this class with use_fast=False. W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] torch._dynamo hit config.recompile_limit (8) W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] function: 'call' (/home/jiqing/transformers/src/trans formers/modeling_layers.py:59) W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] last reason: 8/3: ___check_type_id(self, 787516432), type =<class 'transformers.models.mllama.modeling_mllama.MllamaCrossAttentionDecoderLayer'> # if self.gradient_checkpointing and self.tra ining: # ome/jiqing/transformers/src/transformers/modeling_layers.py:60 in call (HINT: type MllamaCrossAttentionDecoderLayer) W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] User stack trace: W0305 07:24:38.959000 1936280 torch/dynamo/convert_frame.py:1767] [8/8] File "/home/jiqing/transformers/src/transformers/modeling layers.py", line 60, in call W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] if self.gradient_checkpointing and self.training: W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] To log all recompilation reasons, use TORCH_LOGS="recompiles ". W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] To diagnose recompilation issues, see https://docs.pytorch.o rg/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html <|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>If I had to write a haiku for this one, it would be: <|eot_id|><|start_header_id|>assistant<|end_header_id|> Here is a haiku for the image:

A rabbit in a coat Stands on a dirt path, smiling Springtime's gentle charm<|eot_id|>

Fix Action

Fixed

PR fix notes

PR #44845: Fix Mllama torch.compile failure caused by new attention mask logic

Description (problem / solution / changelog)

What does this PR do?

Fixes torch.compile failure for Mllama after #42848 introduced a new unified attention mask creation path.

The root cause is a torch inductor C++ codegen bug: when padding_mask_function uses advanced tensor indexing (padding_mask[batch_idx, kv_idx]), the generated C++ boundary-check code references an undeclared variable (tmp2), causing g++ compilation to fail with CppCompileError.

This PR applies two changes:

  1. masking_utils.py: In the non-vmap sdpa_mask path, apply the padding mask separately using slice-based indexing (padding_mask[:, kv_offset : kv_offset + kv_length]) instead of merging it into the mask_function with advanced tensor indexing. This avoids the inductor codegen bug while producing identical results.

  2. modeling_mllama.py: Replace torch.arange-based fancy indexing with simple slice indexing when extracting cross_attention_mask and full_text_row_masked_out_mask for the current sequence position. This is semantically equivalent but more torch.compile-friendly.

Fixes #44458

Changed files

  • src/transformers/masking_utils.py (modified, +11/-1)
  • src/transformers/models/mllama/modeling_mllama.py (modified, +4/-4)

PR #44850: [Mllama] Fix workaround compile

Description (problem / solution / changelog)

See #44458

This is a deep issue tbh - the cross attentions are reshaped into a different shape than the text input leading to a mismatch between batch sizes. This only gets noticed during compile as it is more strict about the concrete shapes and indices. Tested locally that it works.

Changed files

  • src/transformers/models/mllama/modeling_mllama.py (modified, +5/-6)

Code Example

import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor

model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"

model = MllamaForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="cpu",
)
processor = AutoProcessor.from_pretrained(model_id)

# apply torch compile
model.forward = torch.compile(model.forward)

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
image = Image.open(requests.get(url, stream=True).raw)

messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
    ]}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(
    image,
    input_text,
    add_special_tokens=False,
    return_tensors="pt"
).to(model.device)

output = model.generate(**inputs, max_new_tokens=30)
print(processor.decode(output[0]))

---

Loading weights: 100%|| 906/906 [00:00<00:00, 1756.16it/s, Materializing param=model.vision_model.transformer.layers.31.self_attn.v_
The image processor of type `MllamaImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was sav
ed with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor,
 instantiate this class with `use_fast=False`.
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] torch._dynamo hit config.recompile_limit (8)
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]    function: '__call__' (/home/jiqing/transformers/src/trans
formers/modeling_layers.py:59)
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]    last reason: 8/3: ___check_type_id(self, 787516432), type
=<class 'transformers.models.mllama.modeling_mllama.MllamaCrossAttentionDecoderLayer'>  # if self.gradient_checkpointing and self.tra
ining:  # ome/jiqing/transformers/src/transformers/modeling_layers.py:60 in __call__ (HINT: type MllamaCrossAttentionDecoderLayer)
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] User stack trace:
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]   File "/home/jiqing/transformers/src/transformers/modeling_
layers.py", line 60, in __call__
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]     if self.gradient_checkpointing and self.training:
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] To log all recompilation reasons, use TORCH_LOGS="recompiles
".
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] To diagnose recompilation issues, see https://docs.pytorch.o
rg/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html                                                  <|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>If I had to write a haiku for this one, it would be: <|eot_id|><|start_header_id|>assistant<|end_header_id|>
                                                                                                                                     Here is a haiku for the image:

A rabbit in a coat
Stands on a dirt path, smiling
Springtime's gentle charm<|eot_id|>

---

W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8] torch._dynamo hit config.recompile_limit (8)
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8]    function: '__call__' (/home/jiqing/transformers/src/trans
formers/modeling_layers.py:59)
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8]    last reason: 8/3: ___check_type_id(self, 1250368528), typ
e=<class 'transformers.models.mllama.modeling_mllama.MllamaCrossAttentionDecoderLayer'>  # if self.gradient_checkpointing and self.tr
aining:  # ome/jiqing/transformers/src/transformers/modeling_layers.py:60 in __call__
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8] To log all recompilation reasons, use TORCH_LOGS="recompiles
".
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8] To diagnose recompilation issues, see https://pytorch.org/do
cs/main/compile/programming_model.recompilation.html
Traceback (most recent call last):
  File "/home/jiqing/test_llama_vision.py", line 35, in <module>
    output = model.generate(**inputs, max_new_tokens=30)                                                                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiqing/transformers/src/transformers/generation/utils.py", line 2555, in generate
    result = decoding_method(
             ^^^^^^^^^^^^^^^^
  File "/home/jiqing/transformers/src/transformers/generation/utils.py", line 2762, in _sample
    outputs = model_forward(**model_inputs, return_dict=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 967, in compile_wrapper
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1766, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
    compiled_module = graph.compile_to_module()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2416, in compile_to_module
    return self._compile_to_module()
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2426, in _compile_to_module
    mod = self._compile_to_module_lines(wrapper_code)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2501, in _compile_to_module_lines
.......
......
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 2966, in _worker_compile_cpp
    builder.build()
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 2144, in build
    run_compile_cmd(build_cmd, cwd=_build_tmp_dir)
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 636, in run_compile_cmd
    _run_compile_cmd(cmd_line, cwd)
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 631, in _run_compile_cmd
    raise exc.CppCompileError(cmd, output) from e
torch._inductor.exc.InductorError: CppCompileError: C++ compile error

Command:
g++ /tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp -D TORCH_INDUCTOR_CPP_WRAPPER -D STANDAL
ONE_TORCH_HEADER -D C10_USING_CUSTOM_GENERATED_MACROS -D CPU_CAPABILITY_AVX512 -O3 -DNDEBUG -fno-trapping-math -funsafe-math-optimiza
tions -ffinite-math-only -fno-signed-zeros -fno-math-errno -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -fe
xcess-precision=fast -fno-tree-loop-vectorize -march=native -shared -fPIC -Wall -std=c++17 -Wno-unused-variable -Wno-unknown-pragmas
-pedantic -fopenmp -include /tmp/torchinductor_root/precompiled_headers/cimseuvkhk6u5tg72hhkbkur6zutyynuqzqik7rx7nziiylz223c.h -I/roo
t/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/include/python3.12 -I/opt/.venv/lib/python3.12/site-packages/torch/include
-I/opt/.venv/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -mavx512f -mavx512dq -mavx512vl -mavx512bw -mfma -mavx
512vnni -mavx512vl -mamx-tile -mamx-bf16 -mamx-int8 -mavx512bf16 -mamx-fp16 -o /tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6x
bm67j4hobbgxub2kjtsoqojma.main.so -ltorch -ltorch_cpu -ltorch_python -lgomp -L/root/.local/share/uv/python/cpython-3.12.12-linux-x86_
64-gnu/lib -L/opt/.venv/lib/python3.12/site-packages/torch/lib

...
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp: In functionvoid kernel(const int64_t*, bo
ol*, int64_t, int64_t):
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp:14:62: error: ‘tmp2’ was not declared in thi
s scope; did you mean ‘tm’?
   14 |                     TORCH_CHECK((at::vec::VecMask<int64_t,2>(tmp2 < at::vec::VectorizedN<int64_t,2>(ks1))).all_masked(), "ind
ex out of bounds: tmp2 < ks1");
      |                                                              ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/torch/headeronly/macros/Macros.h:202:64: note: in definition of macro ‘C10_UNLI
KELY  202 | #define C10_UNLIKELY(expr) (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/c10/util/Exception.h:566:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST  566 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {       \
      |       ^~~~~~~~~~~~~~~~~~~~~
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp:14:21: note: in expansion of macro ‘TORCH_CH
ECK   14 |                     TORCH_CHECK((at::vec::VecMask<int64_t,2>(tmp2 < at::vec::VectorizedN<int64_t,2>(ks1))).all_masked(), "ind
ex out of bounds: tmp2 < ks1");
      |                     ^~~~~~~~~~~
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp:36:134: error: ‘tmp2’ was not declared in th
is scope; did you mean ‘tm’?
...
...
/opt/.venv/lib/python3.12/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion fromint’ to ‘char’ changes value from128’ to ‘-128[-Woverflow]
 1866 |       0x80,
      |       ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion fromint’ to ‘char’ changes value from128’ to ‘-128[-Woverflow]
 1868 |       0x80,
      |       ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion fromint’ to ‘char’ changes value from128’ to ‘-128[-Woverflow]
 1870 |       0x80,
      |       ^~~~
...
...
RAW_BUFFERClick to expand / collapse

System Info

torch 2.10.0+cpu

regression PR: #42848

Who can help?

@vasqu

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor

model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"

model = MllamaForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="cpu",
)
processor = AutoProcessor.from_pretrained(model_id)

# apply torch compile
model.forward = torch.compile(model.forward)

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
image = Image.open(requests.get(url, stream=True).raw)

messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
    ]}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(
    image,
    input_text,
    add_special_tokens=False,
    return_tensors="pt"
).to(model.device)

output = model.generate(**inputs, max_new_tokens=30)
print(processor.decode(output[0]))

Expected behavior

output before regression:

Loading weights: 100%|█| 906/906 [00:00<00:00, 1756.16it/s, Materializing param=model.vision_model.transformer.layers.31.self_attn.v_
The image processor of type `MllamaImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was sav
ed with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor,
 instantiate this class with `use_fast=False`.
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] torch._dynamo hit config.recompile_limit (8)
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]    function: '__call__' (/home/jiqing/transformers/src/trans
formers/modeling_layers.py:59)
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]    last reason: 8/3: ___check_type_id(self, 787516432), type
=<class 'transformers.models.mllama.modeling_mllama.MllamaCrossAttentionDecoderLayer'>  # if self.gradient_checkpointing and self.tra
ining:  # ome/jiqing/transformers/src/transformers/modeling_layers.py:60 in __call__ (HINT: type MllamaCrossAttentionDecoderLayer)
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] User stack trace:
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]   File "/home/jiqing/transformers/src/transformers/modeling_
layers.py", line 60, in __call__
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]     if self.gradient_checkpointing and self.training:
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] To log all recompilation reasons, use TORCH_LOGS="recompiles
".
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] To diagnose recompilation issues, see https://docs.pytorch.o
rg/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html                                                  <|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>If I had to write a haiku for this one, it would be: <|eot_id|><|start_header_id|>assistant<|end_header_id|>
                                                                                                                                     Here is a haiku for the image:

A rabbit in a coat
Stands on a dirt path, smiling
Springtime's gentle charm<|eot_id|>

output after regression:

W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8] torch._dynamo hit config.recompile_limit (8)
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8]    function: '__call__' (/home/jiqing/transformers/src/trans
formers/modeling_layers.py:59)
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8]    last reason: 8/3: ___check_type_id(self, 1250368528), typ
e=<class 'transformers.models.mllama.modeling_mllama.MllamaCrossAttentionDecoderLayer'>  # if self.gradient_checkpointing and self.tr
aining:  # ome/jiqing/transformers/src/transformers/modeling_layers.py:60 in __call__
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8] To log all recompilation reasons, use TORCH_LOGS="recompiles
".
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8] To diagnose recompilation issues, see https://pytorch.org/do
cs/main/compile/programming_model.recompilation.html
Traceback (most recent call last):
  File "/home/jiqing/test_llama_vision.py", line 35, in <module>
    output = model.generate(**inputs, max_new_tokens=30)                                                                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiqing/transformers/src/transformers/generation/utils.py", line 2555, in generate
    result = decoding_method(
             ^^^^^^^^^^^^^^^^
  File "/home/jiqing/transformers/src/transformers/generation/utils.py", line 2762, in _sample
    outputs = model_forward(**model_inputs, return_dict=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 967, in compile_wrapper
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1766, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
    compiled_module = graph.compile_to_module()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2416, in compile_to_module
    return self._compile_to_module()
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2426, in _compile_to_module
    mod = self._compile_to_module_lines(wrapper_code)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2501, in _compile_to_module_lines
.......
......
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 2966, in _worker_compile_cpp
    builder.build()
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 2144, in build
    run_compile_cmd(build_cmd, cwd=_build_tmp_dir)
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 636, in run_compile_cmd
    _run_compile_cmd(cmd_line, cwd)
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 631, in _run_compile_cmd
    raise exc.CppCompileError(cmd, output) from e
torch._inductor.exc.InductorError: CppCompileError: C++ compile error

Command:
g++ /tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp -D TORCH_INDUCTOR_CPP_WRAPPER -D STANDAL
ONE_TORCH_HEADER -D C10_USING_CUSTOM_GENERATED_MACROS -D CPU_CAPABILITY_AVX512 -O3 -DNDEBUG -fno-trapping-math -funsafe-math-optimiza
tions -ffinite-math-only -fno-signed-zeros -fno-math-errno -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -fe
xcess-precision=fast -fno-tree-loop-vectorize -march=native -shared -fPIC -Wall -std=c++17 -Wno-unused-variable -Wno-unknown-pragmas
-pedantic -fopenmp -include /tmp/torchinductor_root/precompiled_headers/cimseuvkhk6u5tg72hhkbkur6zutyynuqzqik7rx7nziiylz223c.h -I/roo
t/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/include/python3.12 -I/opt/.venv/lib/python3.12/site-packages/torch/include
-I/opt/.venv/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -mavx512f -mavx512dq -mavx512vl -mavx512bw -mfma -mavx
512vnni -mavx512vl -mamx-tile -mamx-bf16 -mamx-int8 -mavx512bf16 -mamx-fp16 -o /tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6x
bm67j4hobbgxub2kjtsoqojma.main.so -ltorch -ltorch_cpu -ltorch_python -lgomp -L/root/.local/share/uv/python/cpython-3.12.12-linux-x86_
64-gnu/lib -L/opt/.venv/lib/python3.12/site-packages/torch/lib

...
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp: In function ‘void kernel(const int64_t*, bo
ol*, int64_t, int64_t)’:
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp:14:62: error: ‘tmp2’ was not declared in thi
s scope; did you mean ‘tm’?
   14 |                     TORCH_CHECK((at::vec::VecMask<int64_t,2>(tmp2 < at::vec::VectorizedN<int64_t,2>(ks1))).all_masked(), "ind
ex out of bounds: tmp2 < ks1");
      |                                                              ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/torch/headeronly/macros/Macros.h:202:64: note: in definition of macro ‘C10_UNLI
KELY’
  202 | #define C10_UNLIKELY(expr) (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/c10/util/Exception.h:566:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  566 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {       \
      |       ^~~~~~~~~~~~~~~~~~~~~
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp:14:21: note: in expansion of macro ‘TORCH_CH
ECK’
   14 |                     TORCH_CHECK((at::vec::VecMask<int64_t,2>(tmp2 < at::vec::VectorizedN<int64_t,2>(ks1))).all_masked(), "ind
ex out of bounds: tmp2 < ks1");
      |                     ^~~~~~~~~~~
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp:36:134: error: ‘tmp2’ was not declared in th
is scope; did you mean ‘tm’?
...
...
/opt/.venv/lib/python3.12/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘
int’ to ‘char’ changes value from ‘128’ to ‘-128’ [-Woverflow]
 1866 |       0x80,
      |       ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘
int’ to ‘char’ changes value from ‘128’ to ‘-128’ [-Woverflow]
 1868 |       0x80,
      |       ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘
int’ to ‘char’ changes value from ‘128’ to ‘-128’ [-Woverflow]
 1870 |       0x80,
      |       ^~~~
...
...

extent analysis

Fix Plan

1. Downgrade torch to version 1.12.1

pip install torch==1.12.1

2. Remove the line that applies torch compile

# model.forward = torch.compile(model.forward)

3. Run the script again

python test_llama_vision.py

Verification

  • Run the script with the modified code and check if it produces the expected output.
  • Check the console output for any errors or warnings.

Extra Tips

  • Make sure to check the version of torch and transformers before running the script.
  • If you are using a newer version of torch, you may need to update the code to be compatible with the new version.
  • If you are using a custom model or dataset, you may need to modify the code to be compatible with the new version of torch.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

output before regression:

Loading weights: 100%|█| 906/906 [00:00<00:00, 1756.16it/s, Materializing param=model.vision_model.transformer.layers.31.self_attn.v_
The image processor of type `MllamaImageProcessor` is now loaded as a fast processor by default, even if the model checkpoint was sav
ed with a slow processor. This is a breaking change and may produce slightly different outputs. To continue using the slow processor,
 instantiate this class with `use_fast=False`.
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] torch._dynamo hit config.recompile_limit (8)
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]    function: '__call__' (/home/jiqing/transformers/src/trans
formers/modeling_layers.py:59)
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]    last reason: 8/3: ___check_type_id(self, 787516432), type
=<class 'transformers.models.mllama.modeling_mllama.MllamaCrossAttentionDecoderLayer'>  # if self.gradient_checkpointing and self.tra
ining:  # ome/jiqing/transformers/src/transformers/modeling_layers.py:60 in __call__ (HINT: type MllamaCrossAttentionDecoderLayer)
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] User stack trace:
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]   File "/home/jiqing/transformers/src/transformers/modeling_
layers.py", line 60, in __call__
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8]     if self.gradient_checkpointing and self.training:
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] To log all recompilation reasons, use TORCH_LOGS="recompiles
".
W0305 07:24:38.959000 1936280 torch/_dynamo/convert_frame.py:1767] [8/8] To diagnose recompilation issues, see https://docs.pytorch.o
rg/docs/main/user_guide/torch_compiler/compile/programming_model.recompilation.html                                                  <|begin_of_text|><|start_header_id|>user<|end_header_id|>

<|image|>If I had to write a haiku for this one, it would be: <|eot_id|><|start_header_id|>assistant<|end_header_id|>
                                                                                                                                     Here is a haiku for the image:

A rabbit in a coat
Stands on a dirt path, smiling
Springtime's gentle charm<|eot_id|>

output after regression:

W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8] torch._dynamo hit config.recompile_limit (8)
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8]    function: '__call__' (/home/jiqing/transformers/src/trans
formers/modeling_layers.py:59)
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8]    last reason: 8/3: ___check_type_id(self, 1250368528), typ
e=<class 'transformers.models.mllama.modeling_mllama.MllamaCrossAttentionDecoderLayer'>  # if self.gradient_checkpointing and self.tr
aining:  # ome/jiqing/transformers/src/transformers/modeling_layers.py:60 in __call__
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8] To log all recompilation reasons, use TORCH_LOGS="recompiles
".
W0305 06:53:48.493000 1917780 torch/_dynamo/convert_frame.py:1676] [8/8] To diagnose recompilation issues, see https://pytorch.org/do
cs/main/compile/programming_model.recompilation.html
Traceback (most recent call last):
  File "/home/jiqing/test_llama_vision.py", line 35, in <module>
    output = model.generate(**inputs, max_new_tokens=30)                                                                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiqing/transformers/src/transformers/generation/utils.py", line 2555, in generate
    result = decoding_method(
             ^^^^^^^^^^^^^^^^
  File "/home/jiqing/transformers/src/transformers/generation/utils.py", line 2762, in _sample
    outputs = model_forward(**model_inputs, return_dict=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 967, in compile_wrapper
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
                        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1766, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/compile_fx.py", line 1537, in codegen_and_compile
    compiled_module = graph.compile_to_module()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2416, in compile_to_module
    return self._compile_to_module()
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2426, in _compile_to_module
    mod = self._compile_to_module_lines(wrapper_code)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/graph.py", line 2501, in _compile_to_module_lines
.......
......
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/codecache.py", line 2966, in _worker_compile_cpp
    builder.build()
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 2144, in build
    run_compile_cmd(build_cmd, cwd=_build_tmp_dir)
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 636, in run_compile_cmd
    _run_compile_cmd(cmd_line, cwd)
  File "/opt/.venv/lib/python3.12/site-packages/torch/_inductor/cpp_builder.py", line 631, in _run_compile_cmd
    raise exc.CppCompileError(cmd, output) from e
torch._inductor.exc.InductorError: CppCompileError: C++ compile error

Command:
g++ /tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp -D TORCH_INDUCTOR_CPP_WRAPPER -D STANDAL
ONE_TORCH_HEADER -D C10_USING_CUSTOM_GENERATED_MACROS -D CPU_CAPABILITY_AVX512 -O3 -DNDEBUG -fno-trapping-math -funsafe-math-optimiza
tions -ffinite-math-only -fno-signed-zeros -fno-math-errno -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -fe
xcess-precision=fast -fno-tree-loop-vectorize -march=native -shared -fPIC -Wall -std=c++17 -Wno-unused-variable -Wno-unknown-pragmas
-pedantic -fopenmp -include /tmp/torchinductor_root/precompiled_headers/cimseuvkhk6u5tg72hhkbkur6zutyynuqzqik7rx7nziiylz223c.h -I/roo
t/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/include/python3.12 -I/opt/.venv/lib/python3.12/site-packages/torch/include
-I/opt/.venv/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -mavx512f -mavx512dq -mavx512vl -mavx512bw -mfma -mavx
512vnni -mavx512vl -mamx-tile -mamx-bf16 -mamx-int8 -mavx512bf16 -mamx-fp16 -o /tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6x
bm67j4hobbgxub2kjtsoqojma.main.so -ltorch -ltorch_cpu -ltorch_python -lgomp -L/root/.local/share/uv/python/cpython-3.12.12-linux-x86_
64-gnu/lib -L/opt/.venv/lib/python3.12/site-packages/torch/lib

...
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp: In function ‘void kernel(const int64_t*, bo
ol*, int64_t, int64_t)’:
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp:14:62: error: ‘tmp2’ was not declared in thi
s scope; did you mean ‘tm’?
   14 |                     TORCH_CHECK((at::vec::VecMask<int64_t,2>(tmp2 < at::vec::VectorizedN<int64_t,2>(ks1))).all_masked(), "ind
ex out of bounds: tmp2 < ks1");
      |                                                              ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/torch/headeronly/macros/Macros.h:202:64: note: in definition of macro ‘C10_UNLI
KELY’
  202 | #define C10_UNLIKELY(expr) (__builtin_expect(static_cast<bool>(expr), 0))
      |                                                                ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/c10/util/Exception.h:566:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
  566 |   if (C10_UNLIKELY_OR_CONST(!(cond))) {       \
      |       ^~~~~~~~~~~~~~~~~~~~~
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp:14:21: note: in expansion of macro ‘TORCH_CH
ECK’
   14 |                     TORCH_CHECK((at::vec::VecMask<int64_t,2>(tmp2 < at::vec::VectorizedN<int64_t,2>(ks1))).all_masked(), "ind
ex out of bounds: tmp2 < ks1");
      |                     ^~~~~~~~~~~
/tmp/torchinductor_root/6q/c6qt5khkam3fycdonzojkruxp6xbm67j4hobbgxub2kjtsoqojma.main.cpp:36:134: error: ‘tmp2’ was not declared in th
is scope; did you mean ‘tm’?
...
...
/opt/.venv/lib/python3.12/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1866:7: warning: overflow in conversion from ‘
int’ to ‘char’ changes value from ‘128’ to ‘-128’ [-Woverflow]
 1866 |       0x80,
      |       ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1868:7: warning: overflow in conversion from ‘
int’ to ‘char’ changes value from ‘128’ to ‘-128’ [-Woverflow]
 1868 |       0x80,
      |       ^~~~
/opt/.venv/lib/python3.12/site-packages/torch/include/ATen/cpu/vec/vec512/vec512_int.h:1870:7: warning: overflow in conversion from ‘
int’ to ‘char’ changes value from ‘128’ to ‘-128’ [-Woverflow]
 1870 |       0x80,
      |       ^~~~
...
...

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - ✅(Solved) Fix Mllama compile failed after new attn mask [2 pull requests, 3 comments, 2 participants]