transformers - 💡(How to fix) Fix Bug, Generation [2 comments, 2 participants]

transformers2026-04-08 04:07:37

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45307•Fetched 2026-04-09 07:50:51

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Regata3010

Participants

Regata3010

zucchini-nlp

Timeline (top)

commented ×2closed ×1reopened ×1

When using assisted generation (model.generate(assistant_model=...)) with models that have different vocabulary sizes but share the same tokenizer family, the AssistantToTargetTranslator crashes because map_input_embeddings is never initialized.

This affects model pairs like Qwen2.5-7B (vocab=152,064) + Qwen2.5-0.5B (vocab=151,936), which share the same Qwen2.5 tokenizer but have different vocab padding.

Error Message

ValueError: The main and assistant models have different tokenizers.

Root Cause

Fix Action

Workaround

Catching the error and falling back:

try:
    output = target.generate(input_ids, assistant_model=draft, ...)
except (ValueError, AttributeError):
    # Fall back to standard generation
    output = target.generate(input_ids, max_new_tokens=32)

Code Example

from transformers import AutoModelForCausalLM, AutoTokenizer

target = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B", device_map="auto")
draft = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B")

input_ids = tokenizer.encode("Hello world", return_tensors="pt").to("cuda")

# This crashes:
output = target.generate(
    input_ids,
    assistant_model=draft,
    tokenizer=tokenizer,
    assistant_tokenizer=tokenizer,
    max_new_tokens=32,
)

---

ValueError: The main and assistant models have different tokenizers.

---

AttributeError: 'AssistantToTargetTranslator' object has no attribute 'map_input_embeddings'

---

File "transformers/generation/utils.py", line 2521, in generate
    result = decoding_method(...)
File "transformers/generation/utils.py", line 3514, in _assisted_decoding
    candidate_input_ids, candidate_logits = candidate_generator.get_candidates(input_ids)
File "transformers/generation/candidate_generator.py", line 933, in get_candidates
    assistant_input_ids, num_added_tokens = self._prepare_assistant_input_ids(target_input_ids)
File "transformers/generation/candidate_generator.py", line 1009, in _prepare_assistant_input_ids
    self._atm_translator.unmap_input_ids()
File "transformers/generation/candidate_generator.py", line 754, in unmap_input_ids
    self.map_input_embeddings.map = False
AttributeError: 'AssistantToTargetTranslator' object has no attribute 'map_input_embeddings'

---

try:
    output = target.generate(input_ids, assistant_model=draft, ...)
except (ValueError, AttributeError):
    # Fall back to standard generation
    output = target.generate(input_ids, max_new_tokens=32)

RAW_BUFFERClick to expand / collapse

Title

AssistantToTargetTranslator crashes with AttributeError: 'map_input_embeddings' when using assisted generation with cross-vocab models

Description

This affects model pairs like Qwen2.5-7B (vocab=152,064) + Qwen2.5-0.5B (vocab=151,936), which share the same Qwen2.5 tokenizer but have different vocab padding.

Reproduction

from transformers import AutoModelForCausalLM, AutoTokenizer

target = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B", device_map="auto")
draft = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B")

input_ids = tokenizer.encode("Hello world", return_tensors="pt").to("cuda")

# This crashes:
output = target.generate(
    input_ids,
    assistant_model=draft,
    tokenizer=tokenizer,
    assistant_tokenizer=tokenizer,
    max_new_tokens=32,
)

Without tokenizer and assistant_tokenizer, it raises:

ValueError: The main and assistant models have different tokenizers.

With tokenizer and assistant_tokenizer, it raises:

AttributeError: 'AssistantToTargetTranslator' object has no attribute 'map_input_embeddings'

Traceback

File "transformers/generation/utils.py", line 2521, in generate
    result = decoding_method(...)
File "transformers/generation/utils.py", line 3514, in _assisted_decoding
    candidate_input_ids, candidate_logits = candidate_generator.get_candidates(input_ids)
File "transformers/generation/candidate_generator.py", line 933, in get_candidates
    assistant_input_ids, num_added_tokens = self._prepare_assistant_input_ids(target_input_ids)
File "transformers/generation/candidate_generator.py", line 1009, in _prepare_assistant_input_ids
    self._atm_translator.unmap_input_ids()
File "transformers/generation/candidate_generator.py", line 754, in unmap_input_ids
    self.map_input_embeddings.map = False
AttributeError: 'AssistantToTargetTranslator' object has no attribute 'map_input_embeddings'

Expected Behavior

Assisted generation should work with models from the same family that have slightly different vocab sizes, either by:

Properly initializing map_input_embeddings in AssistantToTargetTranslator.__init__
Or handling the case where the tokenizer is the same but vocab sizes differ (padding tokens)

Environment

transformers version: 5.4.0
torch version: 2.11.0+cu128
Python: 3.13.5
GPU: NVIDIA H200
OS: Linux (RHEL 9, HPC cluster)

Context

Found while benchmarking a from-scratch speculative decoding implementation against HF's assisted generation across multiple model pairs. The bug only triggers with cross-vocab model pairs (e.g., Qwen2.5 family). Same-vocab pairs (e.g., Llama-3.1-8B + Llama-3.2-1B) work correctly.

Workaround

Catching the error and falling back:

try:
    output = target.generate(input_ids, assistant_model=draft, ...)
except (ValueError, AttributeError):
    # Fall back to standard generation
    output = target.generate(input_ids, max_new_tokens=32)

extent analysis

TL;DR

The most likely fix is to properly initialize map_input_embeddings in AssistantToTargetTranslator.__init__ to handle models with different vocabulary sizes but the same tokenizer family.

Guidance

Verify that the AssistantToTargetTranslator class is correctly handling the case where the tokenizer is the same but vocab sizes differ by checking the initialization of map_input_embeddings.
Consider adding a check in AssistantToTargetTranslator.__init__ to handle the case where the tokenizer is the same but vocab sizes differ.
If the above fix is not feasible, use the provided workaround of catching the error and falling back to standard generation.
Test the fix with different model pairs to ensure it works correctly for all cases.

Example

# Example of how map_input_embeddings could be initialized
class AssistantToTargetTranslator:
    def __init__(self, target_tokenizer, assistant_tokenizer):
        if target_tokenizer == assistant_tokenizer:
            # Handle the case where the tokenizer is the same but vocab sizes differ
            self.map_input_embeddings = ...
        else:
            # Handle the case where the tokenizers are different
            self.map_input_embeddings = ...

Notes

The provided workaround may not be ideal as it falls back to standard generation, which may not be the desired behavior. A proper fix would involve initializing map_input_embeddings correctly in AssistantToTargetTranslator.__init__.

Recommendation

Apply the workaround of catching the error and falling back to standard generation until a proper fix is implemented. This will allow for assisted generation to work with models from the same family that have slightly different vocab sizes, although it may not be the most efficient solution.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#tokenizer error #prompt formatting #chain error #conversation history #tool integration

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - 💡(How to fix) Fix Bug, Generation [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Code Example

Title

Description

Reproduction

Traceback

Expected Behavior

Environment

Context

Workaround

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

transformers - 💡(How to fix) Fix Bug, Generation [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Code Example

Title

Description

Reproduction

Traceback

Expected Behavior

Environment

Context

Workaround

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING