ollama - 💡(How to fix) Fix Add MLX prequantized import support for Nemotron-H architecture [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15175Fetched 2026-04-08 01:58:19
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Participants
Timeline (top)
subscribed ×4commented ×1labeled ×1

ollama create fails when importing MLX-quantized SafeTensors for Nemotron-H models with Error: unknown data type: U32.

PR #14878 added the tensorImportTransform framework with Qwen3.5 support. Requesting the same for NemotronHForCausalLM (model_type: nemotron_h). The architecture class for the registry would be NemotronHForCausalLM.

This also highlights that any MLX-quantized model outside of Qwen3.5 currently hits this same U32 error, since MLX quantization universally packs weights into U32 containers.

Error Message

Ollama v0.19.0, macOS Apple Silicon

cat > Modelfile <<EOF2 FROM /path/to/Nemotron-3-Super-120B-A12B-MLX-6bit

TEMPLATE """{{- if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}<|im_start|>user {{ .Prompt }}<|im_end|> <|im_start|>assistant """

PARAMETER stop "<|im_end|>" PARAMETER stop "</s>" PARAMETER num_ctx 8192 EOF2

ollama create nemotron-120b -f Modelfile

Error: unknown data type: U32

Root Cause

ollama create fails when importing MLX-quantized SafeTensors for Nemotron-H models with Error: unknown data type: U32.

PR #14878 added the tensorImportTransform framework with Qwen3.5 support. Requesting the same for NemotronHForCausalLM (model_type: nemotron_h). The architecture class for the registry would be NemotronHForCausalLM.

This also highlights that any MLX-quantized model outside of Qwen3.5 currently hits this same U32 error, since MLX quantization universally packs weights into U32 containers.

Code Example

# Ollama v0.19.0, macOS Apple Silicon

cat > Modelfile <<EOF2
FROM /path/to/Nemotron-3-Super-120B-A12B-MLX-6bit

TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

PARAMETER stop "<|im_end|>"
PARAMETER stop "</s>"
PARAMETER num_ctx 8192
EOF2

ollama create nemotron-120b -f Modelfile
# Error: unknown data type: U32
RAW_BUFFERClick to expand / collapse

Summary

ollama create fails when importing MLX-quantized SafeTensors for Nemotron-H models with Error: unknown data type: U32.

PR #14878 added the tensorImportTransform framework with Qwen3.5 support. Requesting the same for NemotronHForCausalLM (model_type: nemotron_h). The architecture class for the registry would be NemotronHForCausalLM.

This also highlights that any MLX-quantized model outside of Qwen3.5 currently hits this same U32 error, since MLX quantization universally packs weights into U32 containers.

Model

Architecture

  • Hybrid Mamba-2 + Transformer Attention + Latent MoE
  • 120B total params, 12B active per token
  • 512 routed experts, 22 active, 1 shared
  • 88 layers alternating Mamba (M) and Attention+MoE (E)
  • Tensor types in SafeTensors: BF16 (weights), F32 (norms), U32 (quantized packed weights)

Steps to reproduce

# Ollama v0.19.0, macOS Apple Silicon

cat > Modelfile <<EOF2
FROM /path/to/Nemotron-3-Super-120B-A12B-MLX-6bit

TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

PARAMETER stop "<|im_end|>"
PARAMETER stop "</s>"
PARAMETER num_ctx 8192
EOF2

ollama create nemotron-120b -f Modelfile
# Error: unknown data type: U32

Notes

  • The GGUF path works — llama.cpp added Nemotron 3 Super support in ggml-org/llama.cpp#20411
  • Native MLX import would let Apple Silicon users skip GGUF conversion
  • The tensorImportTransform framework from #14878 should make this straightforward to add

extent analysis

TL;DR

The most likely fix is to add support for U32 data type in the ollama create command by utilizing the tensorImportTransform framework.

Guidance

  • The error message Error: unknown data type: U32 suggests that the ollama create command does not currently support the U32 data type used in the MLX-quantized SafeTensors for Nemotron-H models.
  • To fix this, the tensorImportTransform framework from PR #14878 can be used to add support for U32 data type, similar to how it was done for Qwen3.5 support.
  • The architecture class for the registry should be NemotronHForCausalLM to enable native MLX import for Nemotron-H models.
  • Verifying the fix can be done by running the ollama create command with the updated tensorImportTransform framework and checking if the error message is resolved.

Example

No code snippet is provided as it is not clearly supported by the issue, but the tensorImportTransform framework can be used as a reference to add support for U32 data type.

Notes

The current limitation is that the ollama create command does not support U32 data type, which is used in MLX-quantized SafeTensors for Nemotron-H models. Adding support for U32 data type using the tensorImportTransform framework should resolve the issue.

Recommendation

Apply workaround by utilizing the tensorImportTransform framework to add support for U32 data type, as it is a straightforward solution that has been successfully implemented for Qwen3.5 support.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING