transformers - 💡(How to fix) Fix AttentionInterface.register changes behavior of registered function

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Code Example

from transformers import AutoModelForCausalLM, AttentionInterface, pipeline
from transformers.integrations.sdpa_attention import sdpa_attention_forward

model1 = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B", attn_implementation="sdpa")
pipeline1 = pipeline(task="text-generation", model=model1, tokenizer="meta-llama/Llama-3.2-1B",
                    cache_implementation="static")
print(pipeline1("It was a bright cold day in April, and the clocks were striking thirteen."))

AttentionInterface.register("reregistered_sdpa", sdpa_attention_forward)
model2 = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B", attn_implementation="reregistered_sdpa")
pipeline2 = pipeline(task="text-generation", model=model2, tokenizer="meta-llama/Llama-3.2-1B",
                     cache_implementation="static")
print(pipeline2("It was a bright cold day in April, and the clocks were striking thirteen."))
RAW_BUFFERClick to expand / collapse

System Info

transformers env fails with NameError: name 'CompletionCreateParamsStreaming' is not defined

I am running Ubuntu 25.10, Python 3.13.7, and pytorch 2.11.0

Who can help?

@Cyrilvallez

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

In the following code, model1 (using attn_implementation="sdpa") produces plausible prose, while model2 (using attn_implementation="reregistered_sdpa") produces text somewhere between nonsense and gibberish.

It is necessary to use the cache_implementation="static", but the behavior seems consistent with various models.

from transformers import AutoModelForCausalLM, AttentionInterface, pipeline
from transformers.integrations.sdpa_attention import sdpa_attention_forward

model1 = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B", attn_implementation="sdpa")
pipeline1 = pipeline(task="text-generation", model=model1, tokenizer="meta-llama/Llama-3.2-1B",
                    cache_implementation="static")
print(pipeline1("It was a bright cold day in April, and the clocks were striking thirteen."))

AttentionInterface.register("reregistered_sdpa", sdpa_attention_forward)
model2 = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B", attn_implementation="reregistered_sdpa")
pipeline2 = pipeline(task="text-generation", model=model2, tokenizer="meta-llama/Llama-3.2-1B",
                     cache_implementation="static")
print(pipeline2("It was a bright cold day in April, and the clocks were striking thirteen."))

Expected behavior

sdpa_attention_forward should behave the same whether it is called through the pre-registered name of "sdpa" or is re-registered with a new name.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

sdpa_attention_forward should behave the same whether it is called through the pre-registered name of "sdpa" or is re-registered with a new name.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - 💡(How to fix) Fix AttentionInterface.register changes behavior of registered function