ollama - 💡(How to fix) Fix Add MLX prequantized import support for Nemotron-H architecture [1 comments, 2 participants]

ollama2026-03-31 14:30:37

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#15175•Fetched 2026-04-08 01:58:19

View on GitHub

Comments

Participants

Timeline

Reactions

Author

FaisalFehad

Participants

CJSen

FaisalFehad

Timeline (top)

subscribed ×4commented ×1labeled ×1

ollama create fails when importing MLX-quantized SafeTensors for Nemotron-H models with Error: unknown data type: U32.

PR #14878 added the tensorImportTransform framework with Qwen3.5 support. Requesting the same for NemotronHForCausalLM (model_type: nemotron_h). The architecture class for the registry would be NemotronHForCausalLM.

This also highlights that any MLX-quantized model outside of Qwen3.5 currently hits this same U32 error, since MLX quantization universally packs weights into U32 containers.

Error Message

Ollama v0.19.0, macOS Apple Silicon

cat > Modelfile <<EOF2 FROM /path/to/Nemotron-3-Super-120B-A12B-MLX-6bit

TEMPLATE """{{- if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}<|im_start|>user {{ .Prompt }}<|im_end|> <|im_start|>assistant """

PARAMETER stop "<|im_end|>" PARAMETER stop "</s>" PARAMETER num_ctx 8192 EOF2

ollama create nemotron-120b -f Modelfile

Error: unknown data type: U32

Root Cause

ollama create fails when importing MLX-quantized SafeTensors for Nemotron-H models with Error: unknown data type: U32.

This also highlights that any MLX-quantized model outside of Qwen3.5 currently hits this same U32 error, since MLX quantization universally packs weights into U32 containers.

Code Example

# Ollama v0.19.0, macOS Apple Silicon

cat > Modelfile <<EOF2
FROM /path/to/Nemotron-3-Super-120B-A12B-MLX-6bit

TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

PARAMETER stop "<|im_end|>"
PARAMETER stop "</s>"
PARAMETER num_ctx 8192
EOF2

ollama create nemotron-120b -f Modelfile
# Error: unknown data type: U32

RAW_BUFFERClick to expand / collapse

Summary

ollama create fails when importing MLX-quantized SafeTensors for Nemotron-H models with Error: unknown data type: U32.

This also highlights that any MLX-quantized model outside of Qwen3.5 currently hits this same U32 error, since MLX quantization universally packs weights into U32 containers.

Model

FF-01/Nemotron-3-Super-120B-A12B-MLX-6bit
Base: nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
6-bit affine quantization, ~92GB, 20 SafeTensors shards
Runs at ~43.6 tok/s on M5 Pro Max via mlx-lm

Architecture

Hybrid Mamba-2 + Transformer Attention + Latent MoE
120B total params, 12B active per token
512 routed experts, 22 active, 1 shared
88 layers alternating Mamba (M) and Attention+MoE (E)
Tensor types in SafeTensors: BF16 (weights), F32 (norms), U32 (quantized packed weights)

Steps to reproduce

# Ollama v0.19.0, macOS Apple Silicon

cat > Modelfile <<EOF2
FROM /path/to/Nemotron-3-Super-120B-A12B-MLX-6bit

TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

PARAMETER stop "<|im_end|>"
PARAMETER stop "</s>"
PARAMETER num_ctx 8192
EOF2

ollama create nemotron-120b -f Modelfile
# Error: unknown data type: U32

Notes

The GGUF path works — llama.cpp added Nemotron 3 Super support in ggml-org/llama.cpp#20411
Native MLX import would let Apple Silicon users skip GGUF conversion
The tensorImportTransform framework from #14878 should make this straightforward to add

extent analysis

TL;DR

The most likely fix is to add support for U32 data type in the ollama create command by utilizing the tensorImportTransform framework.

Guidance

The error message Error: unknown data type: U32 suggests that the ollama create command does not currently support the U32 data type used in the MLX-quantized SafeTensors for Nemotron-H models.
To fix this, the tensorImportTransform framework from PR #14878 can be used to add support for U32 data type, similar to how it was done for Qwen3.5 support.
The architecture class for the registry should be NemotronHForCausalLM to enable native MLX import for Nemotron-H models.
Verifying the fix can be done by running the ollama create command with the updated tensorImportTransform framework and checking if the error message is resolved.

Example

No code snippet is provided as it is not clearly supported by the issue, but the tensorImportTransform framework can be used as a reference to add support for U32 data type.

Notes

The current limitation is that the ollama create command does not support U32 data type, which is used in MLX-quantized SafeTensors for Nemotron-H models. Adding support for U32 data type using the tensorImportTransform framework should resolve the issue.

Recommendation

Apply workaround by utilizing the tensorImportTransform framework to add support for U32 data type, as it is a straightforward solution that has been successfully implemented for Qwen3.5 support.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix Add MLX prequantized import support for Nemotron-H architecture [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Ollama v0.19.0, macOS Apple Silicon

Error: unknown data type: U32

Root Cause

Code Example

Summary

Model

Architecture

Steps to reproduce

Notes

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix Add MLX prequantized import support for Nemotron-H architecture [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Ollama v0.19.0, macOS Apple Silicon

Error: unknown data type: U32

Root Cause

Code Example

Summary

Model

Architecture

Steps to reproduce

Notes

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING