transformers - 💡(How to fix) Fix NemotronH implementation can't load NemotronH checkpoints! [7 comments, 6 participants]

transformers2026-03-19 15:15:06

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#44863•Fetched 2026-04-08 01:03:25

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×7mentioned ×6subscribed ×6cross-referenced ×1

Error Message

Traceback (most recent call last): File "/Users/thomas/Documents/GitHub/local-lm/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 275, in getattr return self.data[item] ~~~~~~~~~^^^^^^ KeyError: 'shape'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/path/main.py", line 17, in <module> outputs = model.generate(tokenized_chat) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/path/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/path/.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2390, in generate batch_size = inputs_tensor.shape[0] ^^^^^^^^^^^^^^^^^^^ File "/path/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 277, in getattr raise AttributeError AttributeError

Root Cause

If I add a check to ignore the - character I instead see something like this:

num_hidden_layers (52) is deprecated and doesn't match layers_block_type length (28). Using layers_block_type length.
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
num_hidden_layers (52) is deprecated and doesn't match layers_block_type length (28). Using layers_block_type length.
Fetching 4 files: 100%|██████████| 4/4 [05:01<00:00, 75.32s/it] 
The fast path is not available because one of `(selective_state_update, causal_conv1d_fn, causal_conv1d_update)` is None. Falling back to the naive implementation. To install follow https://github.com/state-spaces/mamba/#installation and https://github.com/Dao-AILab/causal-conv1d
Loading weights: 100%|██████████| 119/119 [00:00<00:00, 26544.82it/s]
NemotronHForCausalLM LOAD REPORT from: nvidia/Nemotron-H-8B-Reasoning-128K
Key                                              | Status     | 
-------------------------------------------------+------------+-
model.layers.{4...50}.mixer.A_log                | UNEXPECTED | 
model.layers.{1...51}.mixer.down_proj.weight     | UNEXPECTED | 
model.layers.{4...50}.mixer.norm.weight          | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.v_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.out_proj.weight      | UNEXPECTED | 
model.layers.{4...50}.mixer.in_proj.weight       | UNEXPECTED | 
model.layers.{28...51}.norm.weight               | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.o_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.dt_bias              | UNEXPECTED | 
model.layers.{1...51}.mixer.up_proj.weight       | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.k_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.D                    | UNEXPECTED | 
model.layers.{4...50}.mixer.conv1d.bias          | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.q_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.conv1d.weight        | UNEXPECTED | 
model.layers.{1...27}.mixer.out_proj.weight      | MISSING    | 
model.layers.{1...27}.mixer.norm.weight          | MISSING    | 
model.layers.{1...27}.mixer.A_log                | MISSING    | 
model.layers.{1...27}.mixer.dt_bias              | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.o_proj.weight | MISSING    | 
model.layers.{1...27}.mixer.conv1d.bias          | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.k_proj.weight | MISSING    | 
model.layers.{1...27}.mixer.in_proj.weight       | MISSING    | 
model.layers.{1...27}.mixer.D                    | MISSING    | 
model.layers.{1...27}.mixer.conv1d.weight        | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.q_proj.weight | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.v_proj.weight | MISSING    |

Code Example

- `transformers` version: 5.3.0
- Platform: macOS-15.7.3-arm64-arm-64bit
- Python version: 3.12.11
- Huggingface_hub version: 1.7.1
- Safetensors version: 0.7.0
- Accelerate version: not installed
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.10.0 (NA)
- Using distributed or parallel set-up in script?: NO

---

tokenizer  = AutoTokenizer.from_pretrained("nvidia/Nemotron-H-8B-Reasoning-128K")
model = NemotronHForCausalLM.from_pretrained("nvidia/Nemotron-H-8B-Reasoning-128K", torch_dtype=torch.bfloat16)

---

tokenizer  = AutoTokenizer.from_pretrained("nvidia/Nemotron-H-4B-Instruct-128K")
model = NemotronHForCausalLM.from_pretrained("nvidia/Nemotron-H-4B-Instruct-128K", torch_dtype=torch.bfloat16)

---

num_hidden_layers (52) is deprecated and doesn't match layers_block_type length (28). Using layers_block_type length.
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
num_hidden_layers (52) is deprecated and doesn't match layers_block_type length (28). Using layers_block_type length.
Fetching 4 files: 100%|██████████| 4/4 [05:01<00:00, 75.32s/it] 
The fast path is not available because one of `(selective_state_update, causal_conv1d_fn, causal_conv1d_update)` is None. Falling back to the naive implementation. To install follow https://github.com/state-spaces/mamba/#installation and https://github.com/Dao-AILab/causal-conv1d
Loading weights: 100%|██████████| 119/119 [00:00<00:00, 26544.82it/s]
NemotronHForCausalLM LOAD REPORT from: nvidia/Nemotron-H-8B-Reasoning-128K
Key                                              | Status     | 
-------------------------------------------------+------------+-
model.layers.{4...50}.mixer.A_log                | UNEXPECTED | 
model.layers.{1...51}.mixer.down_proj.weight     | UNEXPECTED | 
model.layers.{4...50}.mixer.norm.weight          | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.v_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.out_proj.weight      | UNEXPECTED | 
model.layers.{4...50}.mixer.in_proj.weight       | UNEXPECTED | 
model.layers.{28...51}.norm.weight               | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.o_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.dt_bias              | UNEXPECTED | 
model.layers.{1...51}.mixer.up_proj.weight       | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.k_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.D                    | UNEXPECTED | 
model.layers.{4...50}.mixer.conv1d.bias          | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.q_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.conv1d.weight        | UNEXPECTED | 
model.layers.{1...27}.mixer.out_proj.weight      | MISSING    | 
model.layers.{1...27}.mixer.norm.weight          | MISSING    | 
model.layers.{1...27}.mixer.A_log                | MISSING    | 
model.layers.{1...27}.mixer.dt_bias              | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.o_proj.weight | MISSING    | 
model.layers.{1...27}.mixer.conv1d.bias          | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.k_proj.weight | MISSING    | 
model.layers.{1...27}.mixer.in_proj.weight       | MISSING    | 
model.layers.{1...27}.mixer.D                    | MISSING    | 
model.layers.{1...27}.mixer.conv1d.weight        | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.q_proj.weight | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.v_proj.weight | MISSING    | 

Notes:
- UNEXPECTED    :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING    :those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.

---

Traceback (most recent call last):
  File "/Users/thomas/Documents/GitHub/local-lm/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 275, in __getattr__
    return self.data[item]
           ~~~~~~~~~^^^^^^
KeyError: 'shape'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/path/main.py", line 17, in <module>
    outputs = model.generate(tokenized_chat)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/path/.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2390, in generate
    batch_size = inputs_tensor.shape[0]
                 ^^^^^^^^^^^^^^^^^^^
  File "/path/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 277, in __getattr__
    raise AttributeError
AttributeError

RAW_BUFFERClick to expand / collapse

System Info

- `transformers` version: 5.3.0
- Platform: macOS-15.7.3-arm64-arm-64bit
- Python version: 3.12.11
- Huggingface_hub version: 1.7.1
- Safetensors version: 0.7.0
- Accelerate version: not installed
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.10.0 (NA)
- Using distributed or parallel set-up in script?: NO

Who can help?

@ArthurZucker @Cyrilvallez @liding-nv

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

either of:

tokenizer  = AutoTokenizer.from_pretrained("nvidia/Nemotron-H-8B-Reasoning-128K")
model = NemotronHForCausalLM.from_pretrained("nvidia/Nemotron-H-8B-Reasoning-128K", torch_dtype=torch.bfloat16)

tokenizer  = AutoTokenizer.from_pretrained("nvidia/Nemotron-H-4B-Instruct-128K")
model = NemotronHForCausalLM.from_pretrained("nvidia/Nemotron-H-4B-Instruct-128K", torch_dtype=torch.bfloat16)

fails. The first error appears to be that these functions: https://github.com/huggingface/transformers/blob/884333368ff329090c73bd00e57996727f301de3/src/transformers/models/nemotron_h/configuration_nemotron_h.py#L259-L269

do not correctly handle the - character in layer type patterns, but the saved models on the hub use them.

If I add a check to ignore the - character I instead see something like this:

num_hidden_layers (52) is deprecated and doesn't match layers_block_type length (28). Using layers_block_type length.
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
num_hidden_layers (52) is deprecated and doesn't match layers_block_type length (28). Using layers_block_type length.
Fetching 4 files: 100%|██████████| 4/4 [05:01<00:00, 75.32s/it] 
The fast path is not available because one of `(selective_state_update, causal_conv1d_fn, causal_conv1d_update)` is None. Falling back to the naive implementation. To install follow https://github.com/state-spaces/mamba/#installation and https://github.com/Dao-AILab/causal-conv1d
Loading weights: 100%|██████████| 119/119 [00:00<00:00, 26544.82it/s]
NemotronHForCausalLM LOAD REPORT from: nvidia/Nemotron-H-8B-Reasoning-128K
Key                                              | Status     | 
-------------------------------------------------+------------+-
model.layers.{4...50}.mixer.A_log                | UNEXPECTED | 
model.layers.{1...51}.mixer.down_proj.weight     | UNEXPECTED | 
model.layers.{4...50}.mixer.norm.weight          | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.v_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.out_proj.weight      | UNEXPECTED | 
model.layers.{4...50}.mixer.in_proj.weight       | UNEXPECTED | 
model.layers.{28...51}.norm.weight               | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.o_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.dt_bias              | UNEXPECTED | 
model.layers.{1...51}.mixer.up_proj.weight       | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.k_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.D                    | UNEXPECTED | 
model.layers.{4...50}.mixer.conv1d.bias          | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.q_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.conv1d.weight        | UNEXPECTED | 
model.layers.{1...27}.mixer.out_proj.weight      | MISSING    | 
model.layers.{1...27}.mixer.norm.weight          | MISSING    | 
model.layers.{1...27}.mixer.A_log                | MISSING    | 
model.layers.{1...27}.mixer.dt_bias              | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.o_proj.weight | MISSING    | 
model.layers.{1...27}.mixer.conv1d.bias          | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.k_proj.weight | MISSING    | 
model.layers.{1...27}.mixer.in_proj.weight       | MISSING    | 
model.layers.{1...27}.mixer.D                    | MISSING    | 
model.layers.{1...27}.mixer.conv1d.weight        | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.q_proj.weight | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.v_proj.weight | MISSING    | 

Notes:
- UNEXPECTED    :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING    :those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.

and the model then crashes when calling model.generate:

Traceback (most recent call last):
  File "/Users/thomas/Documents/GitHub/local-lm/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 275, in __getattr__
    return self.data[item]
           ~~~~~~~~~^^^^^^
KeyError: 'shape'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/path/main.py", line 17, in <module>
    outputs = model.generate(tokenized_chat)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/path/.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2390, in generate
    batch_size = inputs_tensor.shape[0]
                 ^^^^^^^^^^^^^^^^^^^
  File "/path/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 277, in __getattr__
    raise AttributeError
AttributeError

Expected behavior

The model should output a prediction instead of crashing.

extent analysis

Fix Plan

The issue arises from the model's weights not being loaded correctly, resulting in missing parameters. To fix this, we need to ensure that the model architecture matches the one used to save the weights.

Here are the steps to fix the issue:

Update the configuration_nemotron_h.py file to correctly handle the - character in layer type patterns.
Modify the model loading code to ignore the num_hidden_layers deprecation warning and use the layers_block_type length instead.
Initialize the missing parameters using the model.init_weights() method.

Code Changes

from transformers import AutoTokenizer, NemotronHForCausalLM

# Update the configuration to handle the '-' character
class NemotronHConfig:
    def __init__(self, **kwargs):
        # ... other configurations ...
        self.layers_block_type = kwargs.get("layers_block_type", [])

    def get_layer_type(self, layer_id):
        layer_type = self.layers_block_type[layer_id // 4]
        # Handle the '-' character
        if "-" in layer_type:
            layer_type = layer_type.replace("-", "")
        return layer_type

# Load the model with the updated configuration
tokenizer = AutoTokenizer.from_pretrained("nvidia/Nemotron-H-8B-Reasoning-128K")
model = NemotronHForCausalLM.from_pretrained("nvidia/Nemotron-H-8B-Reasoning-128K", torch_dtype=torch.bfloat16, config=NemotronHConfig)

# Initialize the missing parameters
model.init_weights()

# Use the model for prediction
outputs = model.generate(tokenized_chat)

Verification

To verify that the fix worked, you can check the model's output for a given input. If the model generates a prediction without crashing, the fix is successful.

# Test the model with a sample input
input_text = "Hello, how are you?"
tokenized_input = tokenizer(input_text, return_tensors="pt")
output = model.generate(tokenized_input)
print(output)

Extra Tips

To prevent similar issues in the future, make sure to:

Keep your model configurations up-to-date with the latest changes in the library.
Verify that the model architecture matches the one used to save the weights.
Initialize missing parameters using the model.init_weights() method.
Test your model with sample inputs to ensure it generates predictions correctly.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

The model should output a prediction instead of crashing.

#api #ssr #installation #tensor shape #autograd error #integration issue #index setup #retrieval issue #search optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - 💡(How to fix) Fix NemotronH implementation can't load NemotronH checkpoints! [7 comments, 6 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

Fix Plan

Code Changes

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

TRENDING

transformers - 💡(How to fix) Fix NemotronH implementation can't load NemotronH checkpoints! [7 comments, 6 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

Fix Plan

Code Changes

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING