vllm - ✅(Solved) Fix [Bug]: Loading of subdirectory safetensors get dispatched to `pt_weights_iterator` [1 pull requests, 3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39699Fetched 2026-04-15 06:20:53
View on GitHub
Comments
3
Participants
2
Timeline
6
Reactions
0
Timeline (top)
commented ×3closed ×1cross-referenced ×1labeled ×1

Error Message

File "/repos/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/serialization.py", line 1855, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/repos/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_weights_only_unpickler.py", line 590, in load return Unpickler(file, encoding=encoding).load() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/repos/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_weights_only_unpickler.py", line 544, in load self.append(self.memo[idx]) ~~~~~~~~~^^^^^ KeyError: 231

Fix Action

Fix / Workaround

When a caller passes allow_patterns_overrides with a subdirectory-scoped pattern such as "talker/model*.safetensors", the pattern matches real .safetensors files on disk but corresponding pattern check silently fails, leaving use_safetensors = False. The downstream iterator to dispatch and select pt_weights_iterator on those safetensors files, causing a pickle deserialization error.

PR fix notes

PR #39700: [Bugfix] Fix model loader compare pattern for safetensors to enable subdir loading

Description (problem / solution / changelog)

Purpose

Resolves #39699

This PR fixed the exact pattern comparison of pattern == "*.safetensors" and updated the condition for the branch of filter_duplicate_safetensors_files.

Without updating the branch condition, subdirectory model with a different model index file of the root still failed to load. When a caller explicitly sets allow_patterns_overrides, skipping the filter should be reasonable: the caller has fully specified which files to load, so deduplicating against an unrelated root-level index is inapplicable.

Please correct me if I understand wrong or there exist any usage that sets allow_patterns_overrides while also relying on filter_duplicate_safetensors_files

Test Plan

I got a script for dev testing

from vllm.config import LoadConfig
from vllm.model_executor.model_loader.default_loader import DefaultModelLoader 

loader = DefaultModelLoader(LoadConfig(load_format="auto"))

source = DefaultModelLoader.Source(
	"inclusionAI/Ming-flash-omni-2.0",
    revision=None,
	prefix="",
	allow_patterns_overrides=["talker/model*.safetensors"],
)


hf_folder, weight_files, use_safetensors = loader._prepare_weights(
	"inclusionAI/Ming-flash-omni-2.0",
    subfolder=None,
    revision=None,
    fall_back_to_pt=False,
	allow_patterns_overrides=["talker/model*.safetensors"],
)
# weight file does exist
print(f" >>> weight_files: {weight_files}")
# use_safetensors is returned as False
print(f" >>> use_safetensors: {use_safetensors}")


# Crashes happened
for name, tensor in loader._get_weights_iterator(source):
	print(name, tensor.shape)

Test Result

On main branch the script failed for the error of the issue linked:

INFO 04-13 12:20:17 [weight_utils.py:615] Time spent downloading weights for inclusionAI/Ming-flash-omni-2.0: 46.467259 seconds
 >>> weight_files: ['/root/.cache/huggingface/hub/models--inclusionAI--Ming-flash-omni-2.0/snapshots/6a2e1dec07066d20f62a743ac7c34284e4a3932d/talker/model.safetensors']
 >>> use_safetensors: False
Loading pt checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading pt checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]

...
  File "/repos/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_weights_only_unpickler.py", line 590, in load
    return Unpickler(file, encoding=encoding).load()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/repos/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_weights_only_unpickler.py", line 544, in load
    self.append(self.memo[idx])
                ~~~~~~~~~^^^^^
KeyError: 231

With changes made on this branch, the script successfully loaded subdirectory scoped model weights:

 >>> weight_files: ['/root/.cache/huggingface/hub/models--inclusionAI--Ming-flash-omni-2.0/snapshots/6a2e1dec07066d20f62a743ac7c34284e4a3932d/talker/model.safetensors']
 >>> use_safetensors: True
INFO 04-13 12:46:22 [weight_utils.py:904] Filesystem type for checkpoints: OVERLAY. Checkpoint size: 1.30 GiB. Available RAM: 1819.53 GiB.
INFO 04-13 12:46:22 [weight_utils.py:927] Auto-prefetch is disabled because the filesystem (OVERLAY) is not a recognized network FS (NFS/Lustre). If you want to force prefetching, start vLLM with --safetensors-load-strategy=prefetch.
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
aggregator.blocks.0.attn.to_k.bias torch.Size([1024])
aggregator.blocks.0.attn.to_k.weight torch.Size([1024, 1024])
aggregator.blocks.0.attn.to_out.0.bias torch.Size([1024])
aggregator.blocks.0.attn.to_out.0.weight torch.Size([1024, 1024])
aggregator.blocks.0.attn.to_q.bias torch.Size([1024])
...

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • vllm/model_executor/model_loader/default_loader.py (modified, +2/-2)

Code Example

Your output of `python collect_env.py` here

---

File "/repos/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/serialization.py", line 1855, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/repos/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_weights_only_unpickler.py", line 590, in load
    return Unpickler(file, encoding=encoding).load()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/repos/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_weights_only_unpickler.py", line 544, in load
    self.append(self.memo[idx])
                ~~~~~~~~~^^^^^
KeyError: 231
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
Your output of `python collect_env.py` here
</details>

🐛 Describe the bug

When using DefaultModelLoader.load_weights to support weight loading of model with subdirectories containing weights for model components, I used allow_patterns_overrides = ["talker/model*.safetensors"] to assign specific model weight to be loaded (confirmed that .safetensors exist locally), but got a pickle deserialization error.

  File "/repos/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/serialization.py", line 1855, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/repos/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_weights_only_unpickler.py", line 590, in load
    return Unpickler(file, encoding=encoding).load()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/repos/vllm-project/vllm/.venv/lib/python3.12/site-packages/torch/_weights_only_unpickler.py", line 544, in load
    self.append(self.memo[idx])
                ~~~~~~~~~^^^^^
KeyError: 231

After triaging, there's a bug that _prepare_weights ignores allow_patterns_overrides for a hardcoded safetensors pattern.

https://github.com/vllm-project/vllm/blob/8d825b87d6590ca971823890f9705988b8709add/vllm/model_executor/model_loader/default_loader.py#L169-L174

When a caller passes allow_patterns_overrides with a subdirectory-scoped pattern such as "talker/model*.safetensors", the pattern matches real .safetensors files on disk but corresponding pattern check silently fails, leaving use_safetensors = False. The downstream iterator to dispatch and select pt_weights_iterator on those safetensors files, causing a pickle deserialization error.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The most likely fix is to modify the _prepare_weights function to correctly handle allow_patterns_overrides for safetensors patterns.

Guidance

  • Review the _prepare_weights function in default_loader.py to ensure it properly checks for allow_patterns_overrides patterns, especially for subdirectory-scoped patterns like "talker/model*.safetensors".
  • Verify that the use_safetensors flag is being set correctly based on the provided allow_patterns_overrides patterns.
  • Check the pattern matching logic to ensure it correctly identifies safetensors files on disk, preventing the downstream iterator from attempting to use pt_weights_iterator on these files.
  • Consider adding logging or debugging statements to track the values of allow_patterns_overrides, use_safetensors, and the matched patterns to better understand the issue.

Example

No code snippet is provided as the issue is more related to the logic and functionality of the _prepare_weights function rather than a specific code error.

Notes

The provided stacktrace indicates a KeyError exception, which suggests an issue with the deserialization process. However, the root cause seems to be the incorrect handling of allow_patterns_overrides patterns in the _prepare_weights function.

Recommendation

Apply a workaround by modifying the _prepare_weights function to correctly handle allow_patterns_overrides for safetensors patterns, as the current implementation seems to ignore these patterns, leading to the deserialization error.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING