transformers - 💡(How to fix) Fix [Bug] XOR logic for `input_ids`/`inputs_embeds` validation produces wrong or misleading error messages across multiple models [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#45183Fetched 2026-04-08 02:33:20
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
labeled ×2closed ×1

Multiple models across the library use an XOR (^) condition to validate input_ids and inputs_embeds inputs. This pattern has two distinct issues depending on the model:

  1. Severe (11 models): XOR is paired with the error message "You cannot specify both ...", which is factually wrong when neither input is provided — the opposite situation.
  2. Minor (most other models, e.g. BART): XOR is paired with the generic message "You must specify exactly one of ...", which is functionally correct but conflates two opposite error cases into one vague message.

Error Message

  1. Severe (11 models): XOR is paired with the error message correct but conflates two opposite error cases into one vague message. still conflates two opposite error cases. Meanwhile, **11 other older models | provided | provided | True | ✅ raise error: "cannot specify both" | | None | None | True | ✅ raise error: "must specify one" | different, opposite error messages. Using a single message for both makes restoring precise, actionable error messages for each failure case.

Root Cause

Multiple models across the library use an XOR (^) condition to validate input_ids and inputs_embeds inputs. This pattern has two distinct issues depending on the model:

  1. Severe (11 models): XOR is paired with the error message "You cannot specify both ...", which is factually wrong when neither input is provided — the opposite situation.
  2. Minor (most other models, e.g. BART): XOR is paired with the generic message "You must specify exactly one of ...", which is functionally correct but conflates two opposite error cases into one vague message.

Code Example

if input_ids is not None and inputs_embeds is not None:
    raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
elif input_ids is not None:
    ...  # use input_ids
elif inputs_embeds is not None:
    ...  # use inputs_embeds
else:
    raise ValueError("You have to specify either input_ids or inputs_embeds")

---

# Pre-#43590 BartDecoder — already broken
if (input_ids is None) ^ (inputs_embeds is not None):
    raise ValueError(
        "You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time"
    )

---

if (input_ids is None) ^ (inputs_embeds is not None):
    raise ValueError("You must specify exactly one of input_ids or inputs_embeds")

---

if input_ids is not None and inputs_embeds is not None:
    raise ValueError(
        "You cannot specify both input_ids and inputs_embeds at the same time"
    )
if input_ids is None and inputs_embeds is None:
    raise ValueError(
        "You have to specify either input_ids or inputs_embeds"
    )
# No elif needed — the embed_tokens call below handles the remaining cases cleanly
if inputs_embeds is None:
    inputs_embeds = self.embed_tokens(input_ids)

---

from transformers import BlenderbotModel, BlenderbotConfig

config = BlenderbotConfig()
model = BlenderbotModel(config)

model.model.decoder(input_ids=None, inputs_embeds=None)

---

if (input_ids is None) ^ (inputs_embeds is not None):
    raise ValueError("You must specify exactly one of input_ids or inputs_embeds")
RAW_BUFFERClick to expand / collapse

System Info

Summary

Multiple models across the library use an XOR (^) condition to validate input_ids and inputs_embeds inputs. This pattern has two distinct issues depending on the model:

  1. Severe (11 models): XOR is paired with the error message "You cannot specify both ...", which is factually wrong when neither input is provided — the opposite situation.
  2. Minor (most other models, e.g. BART): XOR is paired with the generic message "You must specify exactly one of ...", which is functionally correct but conflates two opposite error cases into one vague message.

Background and History

Before PR #43590

BartEncoder used a correct explicit 4-branch check:

if input_ids is not None and inputs_embeds is not None:
    raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
elif input_ids is not None:
    ...  # use input_ids
elif inputs_embeds is not None:
    ...  # use inputs_embeds
else:
    raise ValueError("You have to specify either input_ids or inputs_embeds")

BartDecoder already had the severe bug (XOR + wrong message):

# Pre-#43590 BartDecoder — already broken
if (input_ids is None) ^ (inputs_embeds is not None):
    raise ValueError(
        "You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time"
    )

After PR #43590

The refactor unified both encoder and decoder to:

if (input_ids is None) ^ (inputs_embeds is not None):
    raise ValueError("You must specify exactly one of input_ids or inputs_embeds")

This fixed the wrong message in BartDecoder, but the XOR logic itself still conflates two opposite error cases. Meanwhile, 11 other older models were not updated and still carry the original severe bug.


Why XOR Is Problematic Here

The truth table for (input_ids is None) ^ (inputs_embeds is not None):

input_idsinputs_embedsXOR resultCorrect action
providedprovidedTrue✅ raise error: "cannot specify both"
providedNoneFalse✅ use input_ids
NoneprovidedFalse✅ use inputs_embeds
NoneNoneTrue✅ raise error: "must specify one"

XOR fires for both the "too many" and "too few" cases, but these require different, opposite error messages. Using a single message for both makes it impossible for users to know whether they passed too many or too few inputs.


Issue 1 (Severe): XOR + Wrong Message — 11 Models Not Updated by #43590

These models still use XOR paired with "You cannot specify both ...", which is factually incorrect when neither input is provided:

ModelFile
BigBird-Pegasusmodeling_bigbird_pegasus.py
BioGPTmodeling_biogpt.py
Blenderbotmodeling_blenderbot.py
Blenderbot Smallmodeling_blenderbot_small.py
M2M-100modeling_m2m_100.py
Marianmodeling_marian.py
mBARTmodeling_mbart.py
Pegasusmodeling_pegasus.py
Pegasus-Xmodeling_pegasus_x.py
Speech-to-Textmodeling_speech_to_text.py
Whispermodeling_whisper.py

Suggested Fix

Replace the XOR with two explicit conditions, which is unambiguous and consistent with the original BartEncoder logic before #43590:

if input_ids is not None and inputs_embeds is not None:
    raise ValueError(
        "You cannot specify both input_ids and inputs_embeds at the same time"
    )
if input_ids is None and inputs_embeds is None:
    raise ValueError(
        "You have to specify either input_ids or inputs_embeds"
    )
# No elif needed — the embed_tokens call below handles the remaining cases cleanly
if inputs_embeds is None:
    inputs_embeds = self.embed_tokens(input_ids)

This preserves the structural simplification introduced in #43590 while restoring precise, actionable error messages for each failure case.

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Reproduction (example with Blenderbot):

from transformers import BlenderbotModel, BlenderbotConfig

config = BlenderbotConfig()
model = BlenderbotModel(config)

model.model.decoder(input_ids=None, inputs_embeds=None)

Expected behavior

Pass neither input — should say "must specify one", but says "cannot specify both"

ValueError: You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time

Issue 2 (Minor): XOR + Vague Message — BART and Most Other Models

Models updated by #43590 (including BART) now use:

if (input_ids is None) ^ (inputs_embeds is not None):
    raise ValueError("You must specify exactly one of input_ids or inputs_embeds")

This message is technically acceptable for the "neither provided" case, but is misleading for the "both provided" case, where the user would expect to be told they passed too many inputs, not that they need to pick one.

extent analysis

TL;DR

Replace the XOR condition with two explicit conditions to provide accurate and actionable error messages for both "too many" and "too few" input cases.

Guidance

  • Identify the models using the XOR condition with incorrect or vague error messages, such as the 11 models listed in the issue.
  • Replace the XOR condition with two explicit conditions: one to check for both inputs being provided and another to check for neither input being provided.
  • Use distinct error messages for each case, such as "You cannot specify both input_ids and inputs_embeds at the same time" and "You have to specify either input_ids or inputs_embeds".
  • Verify the fix by testing the models with different input combinations, including passing both inputs, neither input, and only one input.

Example

if input_ids is not None and inputs_embeds is not None:
    raise ValueError("You cannot specify both input_ids and inputs_embeds at the same time")
if input_ids is None and inputs_embeds is None:
    raise ValueError("You have to specify either input_ids or inputs_embeds")

Notes

  • The suggested fix preserves the structural simplification introduced in PR #43590 while restoring precise error messages.
  • The issue affects multiple models, including BigBird-Pegasus, BioGPT, Blenderbot, and others, which should be updated with the suggested fix.

Recommendation

Apply the workaround by replacing the XOR condition with two explicit conditions, as this provides accurate and actionable error messages for both "too many" and "too few" input cases, improving the overall user experience and reducing potential errors.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Pass neither input — should say "must specify one", but says "cannot specify both"

ValueError: You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING