transformers - 💡(How to fix) Fix LlamaConfig rejects explicit head_dim when hidden_size is not divisible by num_attention_heads [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#46082Fetched 2026-05-20 03:39:19
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Participants
Timeline (top)
subscribed ×3mentioned ×2commented ×1

Error Message

This raises a validation error before the model can be constructed, even though the attention projection dimensions are well-defined from num_attention_heads * head_dim. If head_dim is not provided, the existing validation error should still be raised because head_dim must be derived from hidden_size // num_attention_heads.

Root Cause

If head_dim is not provided, the existing validation error should still be raised because head_dim must be derived from hidden_size // num_attention_heads.

Code Example

from transformers import LlamaConfig, LlamaForCausalLM

config = LlamaConfig(
    vocab_size=99,
    hidden_size=512,
    intermediate_size=1024,
    num_hidden_layers=1,
    num_attention_heads=9,
    num_key_value_heads=1,
    head_dim=56,
)

model = LlamaForCausalLM(config)
RAW_BUFFERClick to expand / collapse

System Info

  • transformers version: 5.8.0.dev0 / current main
  • Checked against main commit a0fb01c6cda2301cb54de0efde5fac405836c4fe
  • Python version: 3.13.12

Who can help?

@ArthurZucker @Cyrilvallez

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

LlamaConfig exposes head_dim, but validation still rejects configs where hidden_size is not divisible by num_attention_heads, even when head_dim is explicitly provided.

from transformers import LlamaConfig, LlamaForCausalLM

config = LlamaConfig(
    vocab_size=99,
    hidden_size=512,
    intermediate_size=1024,
    num_hidden_layers=1,
    num_attention_heads=9,
    num_key_value_heads=1,
    head_dim=56,
)

model = LlamaForCausalLM(config)

This raises a validation error before the model can be constructed, even though the attention projection dimensions are well-defined from num_attention_heads * head_dim.

Expected behavior

If head_dim is explicitly provided, LlamaConfig should allow non-divisible hidden_size / num_attention_heads values and construct projections from num_attention_heads * head_dim.

If head_dim is not provided, the existing validation error should still be raised because head_dim must be derived from hidden_size // num_attention_heads.

Related custom-head_dim support has come up for other model families, for example #36659 and #37187.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

If head_dim is explicitly provided, LlamaConfig should allow non-divisible hidden_size / num_attention_heads values and construct projections from num_attention_heads * head_dim.

If head_dim is not provided, the existing validation error should still be raised because head_dim must be derived from hidden_size // num_attention_heads.

Related custom-head_dim support has come up for other model families, for example #36659 and #37187.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - 💡(How to fix) Fix LlamaConfig rejects explicit head_dim when hidden_size is not divisible by num_attention_heads [1 comments, 2 participants]