transformers - 💡(How to fix) Fix NemotronH implementation can't load NemotronH checkpoints! [7 comments, 6 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#44863Fetched 2026-04-08 01:03:25
View on GitHub
Comments
7
Participants
6
Timeline
21
Reactions
0
Timeline (top)
commented ×7mentioned ×6subscribed ×6cross-referenced ×1

Error Message

Traceback (most recent call last): File "/Users/thomas/Documents/GitHub/local-lm/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 275, in getattr return self.data[item] ~~~~~~~~~^^^^^^ KeyError: 'shape'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/path/main.py", line 17, in <module> outputs = model.generate(tokenized_chat) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/path/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/path/.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2390, in generate batch_size = inputs_tensor.shape[0] ^^^^^^^^^^^^^^^^^^^ File "/path/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 277, in getattr raise AttributeError AttributeError

Root Cause

If I add a check to ignore the - character I instead see something like this:

num_hidden_layers (52) is deprecated and doesn't match layers_block_type length (28). Using layers_block_type length.
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
num_hidden_layers (52) is deprecated and doesn't match layers_block_type length (28). Using layers_block_type length.
Fetching 4 files: 100%|██████████| 4/4 [05:01<00:00, 75.32s/it] 
The fast path is not available because one of `(selective_state_update, causal_conv1d_fn, causal_conv1d_update)` is None. Falling back to the naive implementation. To install follow https://github.com/state-spaces/mamba/#installation and https://github.com/Dao-AILab/causal-conv1d
Loading weights: 100%|██████████| 119/119 [00:00<00:00, 26544.82it/s]
NemotronHForCausalLM LOAD REPORT from: nvidia/Nemotron-H-8B-Reasoning-128K
Key                                              | Status     | 
-------------------------------------------------+------------+-
model.layers.{4...50}.mixer.A_log                | UNEXPECTED | 
model.layers.{1...51}.mixer.down_proj.weight     | UNEXPECTED | 
model.layers.{4...50}.mixer.norm.weight          | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.v_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.out_proj.weight      | UNEXPECTED | 
model.layers.{4...50}.mixer.in_proj.weight       | UNEXPECTED | 
model.layers.{28...51}.norm.weight               | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.o_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.dt_bias              | UNEXPECTED | 
model.layers.{1...51}.mixer.up_proj.weight       | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.k_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.D                    | UNEXPECTED | 
model.layers.{4...50}.mixer.conv1d.bias          | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.q_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.conv1d.weight        | UNEXPECTED | 
model.layers.{1...27}.mixer.out_proj.weight      | MISSING    | 
model.layers.{1...27}.mixer.norm.weight          | MISSING    | 
model.layers.{1...27}.mixer.A_log                | MISSING    | 
model.layers.{1...27}.mixer.dt_bias              | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.o_proj.weight | MISSING    | 
model.layers.{1...27}.mixer.conv1d.bias          | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.k_proj.weight | MISSING    | 
model.layers.{1...27}.mixer.in_proj.weight       | MISSING    | 
model.layers.{1...27}.mixer.D                    | MISSING    | 
model.layers.{1...27}.mixer.conv1d.weight        | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.q_proj.weight | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.v_proj.weight | MISSING    |

Code Example

- `transformers` version: 5.3.0
- Platform: macOS-15.7.3-arm64-arm-64bit
- Python version: 3.12.11
- Huggingface_hub version: 1.7.1
- Safetensors version: 0.7.0
- Accelerate version: not installed
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.10.0 (NA)
- Using distributed or parallel set-up in script?: NO

---

tokenizer  = AutoTokenizer.from_pretrained("nvidia/Nemotron-H-8B-Reasoning-128K")
model = NemotronHForCausalLM.from_pretrained("nvidia/Nemotron-H-8B-Reasoning-128K", torch_dtype=torch.bfloat16)

---

tokenizer  = AutoTokenizer.from_pretrained("nvidia/Nemotron-H-4B-Instruct-128K")
model = NemotronHForCausalLM.from_pretrained("nvidia/Nemotron-H-4B-Instruct-128K", torch_dtype=torch.bfloat16)

---

num_hidden_layers (52) is deprecated and doesn't match layers_block_type length (28). Using layers_block_type length.
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
num_hidden_layers (52) is deprecated and doesn't match layers_block_type length (28). Using layers_block_type length.
Fetching 4 files: 100%|██████████| 4/4 [05:01<00:00, 75.32s/it] 
The fast path is not available because one of `(selective_state_update, causal_conv1d_fn, causal_conv1d_update)` is None. Falling back to the naive implementation. To install follow https://github.com/state-spaces/mamba/#installation and https://github.com/Dao-AILab/causal-conv1d
Loading weights: 100%|██████████| 119/119 [00:00<00:00, 26544.82it/s]
NemotronHForCausalLM LOAD REPORT from: nvidia/Nemotron-H-8B-Reasoning-128K
Key                                              | Status     | 
-------------------------------------------------+------------+-
model.layers.{4...50}.mixer.A_log                | UNEXPECTED | 
model.layers.{1...51}.mixer.down_proj.weight     | UNEXPECTED | 
model.layers.{4...50}.mixer.norm.weight          | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.v_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.out_proj.weight      | UNEXPECTED | 
model.layers.{4...50}.mixer.in_proj.weight       | UNEXPECTED | 
model.layers.{28...51}.norm.weight               | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.o_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.dt_bias              | UNEXPECTED | 
model.layers.{1...51}.mixer.up_proj.weight       | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.k_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.D                    | UNEXPECTED | 
model.layers.{4...50}.mixer.conv1d.bias          | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.q_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.conv1d.weight        | UNEXPECTED | 
model.layers.{1...27}.mixer.out_proj.weight      | MISSING    | 
model.layers.{1...27}.mixer.norm.weight          | MISSING    | 
model.layers.{1...27}.mixer.A_log                | MISSING    | 
model.layers.{1...27}.mixer.dt_bias              | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.o_proj.weight | MISSING    | 
model.layers.{1...27}.mixer.conv1d.bias          | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.k_proj.weight | MISSING    | 
model.layers.{1...27}.mixer.in_proj.weight       | MISSING    | 
model.layers.{1...27}.mixer.D                    | MISSING    | 
model.layers.{1...27}.mixer.conv1d.weight        | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.q_proj.weight | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.v_proj.weight | MISSING    | 

Notes:
- UNEXPECTED    :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING    :those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.

---

Traceback (most recent call last):
  File "/Users/thomas/Documents/GitHub/local-lm/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 275, in __getattr__
    return self.data[item]
           ~~~~~~~~~^^^^^^
KeyError: 'shape'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/path/main.py", line 17, in <module>
    outputs = model.generate(tokenized_chat)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/path/.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2390, in generate
    batch_size = inputs_tensor.shape[0]
                 ^^^^^^^^^^^^^^^^^^^
  File "/path/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 277, in __getattr__
    raise AttributeError
AttributeError
RAW_BUFFERClick to expand / collapse

System Info

- `transformers` version: 5.3.0
- Platform: macOS-15.7.3-arm64-arm-64bit
- Python version: 3.12.11
- Huggingface_hub version: 1.7.1
- Safetensors version: 0.7.0
- Accelerate version: not installed
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.10.0 (NA)
- Using distributed or parallel set-up in script?: NO

Who can help?

@ArthurZucker @Cyrilvallez @liding-nv

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

either of:

tokenizer  = AutoTokenizer.from_pretrained("nvidia/Nemotron-H-8B-Reasoning-128K")
model = NemotronHForCausalLM.from_pretrained("nvidia/Nemotron-H-8B-Reasoning-128K", torch_dtype=torch.bfloat16)

or

tokenizer  = AutoTokenizer.from_pretrained("nvidia/Nemotron-H-4B-Instruct-128K")
model = NemotronHForCausalLM.from_pretrained("nvidia/Nemotron-H-4B-Instruct-128K", torch_dtype=torch.bfloat16)

fails. The first error appears to be that these functions: https://github.com/huggingface/transformers/blob/884333368ff329090c73bd00e57996727f301de3/src/transformers/models/nemotron_h/configuration_nemotron_h.py#L259-L269

do not correctly handle the - character in layer type patterns, but the saved models on the hub use them.

If I add a check to ignore the - character I instead see something like this:

num_hidden_layers (52) is deprecated and doesn't match layers_block_type length (28). Using layers_block_type length.
Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.
num_hidden_layers (52) is deprecated and doesn't match layers_block_type length (28). Using layers_block_type length.
Fetching 4 files: 100%|██████████| 4/4 [05:01<00:00, 75.32s/it] 
The fast path is not available because one of `(selective_state_update, causal_conv1d_fn, causal_conv1d_update)` is None. Falling back to the naive implementation. To install follow https://github.com/state-spaces/mamba/#installation and https://github.com/Dao-AILab/causal-conv1d
Loading weights: 100%|██████████| 119/119 [00:00<00:00, 26544.82it/s]
NemotronHForCausalLM LOAD REPORT from: nvidia/Nemotron-H-8B-Reasoning-128K
Key                                              | Status     | 
-------------------------------------------------+------------+-
model.layers.{4...50}.mixer.A_log                | UNEXPECTED | 
model.layers.{1...51}.mixer.down_proj.weight     | UNEXPECTED | 
model.layers.{4...50}.mixer.norm.weight          | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.v_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.out_proj.weight      | UNEXPECTED | 
model.layers.{4...50}.mixer.in_proj.weight       | UNEXPECTED | 
model.layers.{28...51}.norm.weight               | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.o_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.dt_bias              | UNEXPECTED | 
model.layers.{1...51}.mixer.up_proj.weight       | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.k_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.D                    | UNEXPECTED | 
model.layers.{4...50}.mixer.conv1d.bias          | UNEXPECTED | 
model.layers.{7, 18, 29, 40}.mixer.q_proj.weight | UNEXPECTED | 
model.layers.{4...50}.mixer.conv1d.weight        | UNEXPECTED | 
model.layers.{1...27}.mixer.out_proj.weight      | MISSING    | 
model.layers.{1...27}.mixer.norm.weight          | MISSING    | 
model.layers.{1...27}.mixer.A_log                | MISSING    | 
model.layers.{1...27}.mixer.dt_bias              | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.o_proj.weight | MISSING    | 
model.layers.{1...27}.mixer.conv1d.bias          | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.k_proj.weight | MISSING    | 
model.layers.{1...27}.mixer.in_proj.weight       | MISSING    | 
model.layers.{1...27}.mixer.D                    | MISSING    | 
model.layers.{1...27}.mixer.conv1d.weight        | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.q_proj.weight | MISSING    | 
model.layers.{4, 10, 16, 22}.mixer.v_proj.weight | MISSING    | 

Notes:
- UNEXPECTED    :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING    :those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.

and the model then crashes when calling model.generate:

Traceback (most recent call last):
  File "/Users/thomas/Documents/GitHub/local-lm/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 275, in __getattr__
    return self.data[item]
           ~~~~~~~~~^^^^^^
KeyError: 'shape'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/path/main.py", line 17, in <module>
    outputs = model.generate(tokenized_chat)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/path/.venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2390, in generate
    batch_size = inputs_tensor.shape[0]
                 ^^^^^^^^^^^^^^^^^^^
  File "/path/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 277, in __getattr__
    raise AttributeError
AttributeError

Expected behavior

The model should output a prediction instead of crashing.

extent analysis

Fix Plan

The issue arises from the model's weights not being loaded correctly, resulting in missing parameters. To fix this, we need to ensure that the model architecture matches the one used to save the weights.

Here are the steps to fix the issue:

  • Update the configuration_nemotron_h.py file to correctly handle the - character in layer type patterns.
  • Modify the model loading code to ignore the num_hidden_layers deprecation warning and use the layers_block_type length instead.
  • Initialize the missing parameters using the model.init_weights() method.

Code Changes

from transformers import AutoTokenizer, NemotronHForCausalLM

# Update the configuration to handle the '-' character
class NemotronHConfig:
    def __init__(self, **kwargs):
        # ... other configurations ...
        self.layers_block_type = kwargs.get("layers_block_type", [])

    def get_layer_type(self, layer_id):
        layer_type = self.layers_block_type[layer_id // 4]
        # Handle the '-' character
        if "-" in layer_type:
            layer_type = layer_type.replace("-", "")
        return layer_type

# Load the model with the updated configuration
tokenizer = AutoTokenizer.from_pretrained("nvidia/Nemotron-H-8B-Reasoning-128K")
model = NemotronHForCausalLM.from_pretrained("nvidia/Nemotron-H-8B-Reasoning-128K", torch_dtype=torch.bfloat16, config=NemotronHConfig)

# Initialize the missing parameters
model.init_weights()

# Use the model for prediction
outputs = model.generate(tokenized_chat)

Verification

To verify that the fix worked, you can check the model's output for a given input. If the model generates a prediction without crashing, the fix is successful.

# Test the model with a sample input
input_text = "Hello, how are you?"
tokenized_input = tokenizer(input_text, return_tensors="pt")
output = model.generate(tokenized_input)
print(output)

Extra Tips

To prevent similar issues in the future, make sure to:

  • Keep your model configurations up-to-date with the latest changes in the library.
  • Verify that the model architecture matches the one used to save the weights.
  • Initialize missing parameters using the model.init_weights() method.
  • Test your model with sample inputs to ensure it generates predictions correctly.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The model should output a prediction instead of crashing.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING