transformers - ✅(Solved) Fix from_pretrained no longer uses mmap for CPU weights in transformers 5.x causing full materialization [1 pull requests, 5 comments, 4 participants]

transformers2026-02-24 16:20:18

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#44262•Fetched 2026-04-08 00:29:30

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×5subscribed ×4mentioned ×2closed ×1

Fix Action

Fixed

Fixed by PR: Add Mistral 4 (https://github.com/huggingface/transformers/pull/44760)

PR fix notes

PR #44760: Add Mistral 4

Repository: huggingface/transformers
Author: juliendenize
State: closed | merged: True
Link: https://github.com/huggingface/transformers/pull/44760

Description (problem / solution / changelog)

What does this PR do?

This PR adds support to the Mistral 4 family

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

@ArthurZucker @Cyrilvallez @patrickvonplaten

Changed files

docs/source/en/_toctree.yml (modified, +2/-0)
docs/source/en/model_doc/mistral4.md (added, +116/-0)
src/transformers/models/__init__.py (modified, +1/-0)
src/transformers/models/auto/configuration_auto.py (modified, +2/-0)
src/transformers/models/auto/modeling_auto.py (modified, +5/-0)
src/transformers/models/ministral3/modeling_ministral3.py (modified, +2/-2)
src/transformers/models/ministral3/modular_ministral3.py (modified, +2/-2)
src/transformers/models/mistral4/__init__.py (added, +27/-0)
src/transformers/models/mistral4/configuration_mistral4.py (added, +149/-0)
src/transformers/models/mistral4/convert_mistral4_weight_to_hf.py (added, +608/-0)
src/transformers/models/mistral4/modeling_mistral4.py (added, +730/-0)
src/transformers/models/mistral4/modular_mistral4.py (added, +291/-0)
src/transformers/processing_utils.py (modified, +5/-0)
tests/models/mistral4/__init__.py (added, +17/-0)
tests/models/mistral4/test_modeling_mistral4.py (added, +133/-0)

Code Example

import psutil
from transformers import AutoModel


rss_before = psutil.Process().memory_info().rss
model = AutoModel.from_pretrained(
    "Qwen/Qwen3-Coder-30B-A3B-Instruct",
    dtype="auto",
    device_map="auto",
    max_memory={"cpu": "1024GB", 0: 0},
)
rss_after = psutil.Process().memory_info().rss
print(f"Memory: {(rss_after - rss_before) >> 20} MiB")

---

Loading checkpoint shards: 100%|██████████| 16/16 [00:00<00:00, 17.03it/s]
Memory: 1321 MiB

---

Loading weights: 100%|██████████| 530/530 [00:13<00:00, 39.48it/s, Materializing param=norm.weight]
Memory: 110705 MiB

RAW_BUFFERClick to expand / collapse

I have the following piece of code:

import psutil
from transformers import AutoModel


rss_before = psutil.Process().memory_info().rss
model = AutoModel.from_pretrained(
    "Qwen/Qwen3-Coder-30B-A3B-Instruct",
    dtype="auto",
    device_map="auto",
    max_memory={"cpu": "1024GB", 0: 0},
)
rss_after = psutil.Process().memory_info().rss
print(f"Memory: {(rss_after - rss_before) >> 20} MiB")

When running it on transformers 4.57, I get:

Loading checkpoint shards: 100%|██████████| 16/16 [00:00<00:00, 17.03it/s]
Memory: 1321 MiB

When running it on transformers 5.2, I get:

Loading weights: 100%|██████████| 530/530 [00:13<00:00, 39.48it/s, Materializing param=norm.weight]
Memory: 110705 MiB

I see that the mode loading was reworked. It seems like on the old transformers version model weights are loaded with memory mapping, while on the new transformers version all the weights are materialized.

I wonder how to achieve the old behavior with the new transformers?

extent analysis

Fix Plan

Achieve Old Behavior with New Transformers

To achieve the old behavior with memory mapping, you can use the map_location argument when loading the model weights.

Step-by-Step Solution

Import map_location from torch: Add the following line at the top of your code.

from torch import map_location

2. **Use `map_location` when loading the model**: Update the `from_pretrained` call to use `map_location`.
   ```python
model = AutoModel.from_pretrained(
    "Qwen/Qwen3-Coder-30B-A3B-Instruct",
    dtype="auto",
    device_map="auto",
    max_memory={"cpu": "1024GB", 0: 0},
    map_location='cpu'  # or 'cuda:0' if you're using a GPU
)

This will load the model weights using memory mapping, similar to the old behavior.

Example Code

import psutil
from transformers import AutoModel
from torch import map_location

rss_before = psutil.Process().memory_info().rss
model = AutoModel.from_pretrained(
    "Qwen/Qwen3-Coder-30B-A3B-Instruct",
    dtype="auto",
    device_map="auto",
    max_memory={"cpu": "1024GB", 0: 0},
    map_location='cpu'  # or 'cuda:0' if you're using a GPU
)
rss_after = psutil.Process().memory_info().rss
print(f"Memory: {(rss_after - rss_before) >> 20} MiB")

Verification

Run the code and check the memory usage. It should be similar to the old behavior with transformers 4.57.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #serialization error #model compatibility #GPU setup #container setup #orchestration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - ✅(Solved) Fix from_pretrained no longer uses mmap for CPU weights in transformers 5.x causing full materialization [1 pull requests, 5 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #44760: Add Mistral 4

Description (problem / solution / changelog)

What does this PR do?

Before submitting

Who can review?

Changed files

Code Example

extent analysis

Fix Plan

Achieve Old Behavior with New Transformers

Step-by-Step Solution

Example Code

Verification

Still need to ship something?

TRENDING

transformers - ✅(Solved) Fix from_pretrained no longer uses mmap for CPU weights in transformers 5.x causing full materialization [1 pull requests, 5 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #44760: Add Mistral 4

Description (problem / solution / changelog)

What does this PR do?

Before submitting

Who can review?

Changed files

Code Example

extent analysis

Fix Plan

Achieve Old Behavior with New Transformers

Step-by-Step Solution

Example Code

Verification

Still need to ship something?

RELATED_DISCOVERY

TRENDING