transformers - 💡(How to fix) Fix LlamaConfig rejects explicit head_dim when hidden_size is not divisible by num_attention_heads [1 comments, 2 participants]

Q: Expected behavior

If `head_dim` is explicitly provided, `LlamaConfig` should allow non-divisible `hidden_size` / `num_attention_heads` values and construct projections from `num_attention_heads * head_dim`. If `head_dim` is not provided, the existing validation error should still be raised because `head_dim` must be derived from `hidden_size // num_attention_heads`. Related custom-`head_dim` support has come up for other model families, for example #36659 and #37187.

transformers2026-05-19 19:12:17

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#46082•Fetched 2026-05-20 03:39:19

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Anri-Lombard

Participants

Anri-Lombard

matdou

Timeline (top)

subscribed ×3mentioned ×2commented ×1

Error Message

This raises a validation error before the model can be constructed, even though the attention projection dimensions are well-defined from num_attention_heads * head_dim. If head_dim is not provided, the existing validation error should still be raised because head_dim must be derived from hidden_size // num_attention_heads.

Root Cause

If head_dim is not provided, the existing validation error should still be raised because head_dim must be derived from hidden_size // num_attention_heads.

Code Example

from transformers import LlamaConfig, LlamaForCausalLM

config = LlamaConfig(
    vocab_size=99,
    hidden_size=512,
    intermediate_size=1024,
    num_hidden_layers=1,
    num_attention_heads=9,
    num_key_value_heads=1,
    head_dim=56,
)

model = LlamaForCausalLM(config)

RAW_BUFFERClick to expand / collapse

System Info

transformers version: 5.8.0.dev0 / current main
Checked against main commit a0fb01c6cda2301cb54de0efde5fac405836c4fe
Python version: 3.13.12

Who can help?

@ArthurZucker @Cyrilvallez

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

LlamaConfig exposes head_dim, but validation still rejects configs where hidden_size is not divisible by num_attention_heads, even when head_dim is explicitly provided.

from transformers import LlamaConfig, LlamaForCausalLM

config = LlamaConfig(
    vocab_size=99,
    hidden_size=512,
    intermediate_size=1024,
    num_hidden_layers=1,
    num_attention_heads=9,
    num_key_value_heads=1,
    head_dim=56,
)

model = LlamaForCausalLM(config)

This raises a validation error before the model can be constructed, even though the attention projection dimensions are well-defined from num_attention_heads * head_dim.

Expected behavior

If head_dim is explicitly provided, LlamaConfig should allow non-divisible hidden_size / num_attention_heads values and construct projections from num_attention_heads * head_dim.

If head_dim is not provided, the existing validation error should still be raised because head_dim must be derived from hidden_size // num_attention_heads.

Related custom-head_dim support has come up for other model families, for example #36659 and #37187.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

If head_dim is explicitly provided, LlamaConfig should allow non-divisible hidden_size / num_attention_heads values and construct projections from num_attention_heads * head_dim.

If head_dim is not provided, the existing validation error should still be raised because head_dim must be derived from hidden_size // num_attention_heads.

Related custom-head_dim support has come up for other model families, for example #36659 and #37187.

#output truncation #response parsing #generation error #database connection #vector store

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - 💡(How to fix) Fix LlamaConfig rejects explicit head_dim when hidden_size is not divisible by num_attention_heads [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

FAQ

Expected behavior

Still need to ship something?

TRENDING

transformers - 💡(How to fix) Fix LlamaConfig rejects explicit head_dim when hidden_size is not divisible by num_attention_heads [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING