vllm - 💡(How to fix) Fix [Bug]: runai_streamer + MTP drafter fails to load weights from model_streamer local cache

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

RuntimeError: Cannot find any safetensors model weights with /root/.cache/vllm/assets/model_streamer/<hash>

Root Cause

Log excerpt (abridged):

(Worker_TP1 pid=170) ERROR ... RuntimeError: Cannot find any safetensors model weights with `/root/.cache/vllm/assets/model_streamer/d7905c16`
(Worker_TP0 pid=165) ERROR ... RuntimeError: Cannot find any safetensors model weights with `/root/.cache/vllm/assets/model_streamer/d7905c16`
(EngineCore pid=150) ERROR ... Exception: WorkerProc initialization failed due to an exception in a background process ...
(APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above.

Fix Action

Fix / Workaround

Would appreciate insight if this is a known bug or wrong configuration, or if a workaround exists.

Code Example

RuntimeError: Cannot find any safetensors model weights with `/root/.cache/vllm/assets/model_streamer/<hash>`

---

(Worker_TP1 pid=170) ERROR ... RuntimeError: Cannot find any safetensors model weights with `/root/.cache/vllm/assets/model_streamer/d7905c16`
(Worker_TP0 pid=165) ERROR ... RuntimeError: Cannot find any safetensors model weights with `/root/.cache/vllm/assets/model_streamer/d7905c16`
(EngineCore pid=150) ERROR ... Exception: WorkerProc initialization failed due to an exception in a background process ...
(APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above.

---

layers-0.safetensors
layers-1.safetensors
...
layers-39.safetensors
mtp.safetensors
outside.safetensors
model.safetensors.index.json
config.json
README.md
...
RAW_BUFFERClick to expand / collapse

Your current environment

docker image vllm-openai:v0.20.1

🐛 Describe the bug

  • Model: s3://models/Qwen_Qwen3.6-35B-A3B-FP8/
  • Loader: runai_streamer
  • All model files present in S3 (see listing below).

Behavior:

  • If I start vllm (vllm-openai:v0.20.1) without MTP (speculative decoding) enabled, the model loads and serves correctly from S3 via runai_streamer.
  • If I add speculative_config={"method": "mtp", ...} (e.g., for Qwen-3 MTP), the engine fails with:
RuntimeError: Cannot find any safetensors model weights with `/root/.cache/vllm/assets/model_streamer/<hash>`

(See full log excerpt below.)

  • Review of the S3 path shows all expected safetensors (layers-*.safetensors, mtp.safetensors, outside.safetensors, model.safetensors.index.json, etc.) are present.
  • The error only happens when MTP is enabled, and only during the drafter model load phase.

Theory / Tracing:

  • vLLM loads the main model weights from S3 directly via runai_streamer.
  • For speculative MTP/drafter, vLLM attempts to reload (from a local cache path like /root/.cache/vllm/assets/model_streamer/<hash>) and uses the runai_streamer loader's list_safetensors() function.
  • This function (by design) is non-recursive and only searches for top-level *.safetensors files.
  • The model_streamer cache structure does NOT expose the weights at the root, and so loader returns none found, even though the S3 source is correct.

Log excerpt (abridged):

(Worker_TP1 pid=170) ERROR ... RuntimeError: Cannot find any safetensors model weights with `/root/.cache/vllm/assets/model_streamer/d7905c16`
(Worker_TP0 pid=165) ERROR ... RuntimeError: Cannot find any safetensors model weights with `/root/.cache/vllm/assets/model_streamer/d7905c16`
(EngineCore pid=150) ERROR ... Exception: WorkerProc initialization failed due to an exception in a background process ...
(APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above.

Sample S3 path listing (abridged):

layers-0.safetensors
layers-1.safetensors
...
layers-39.safetensors
mtp.safetensors
outside.safetensors
model.safetensors.index.json
config.json
README.md
...

Summary:

  • Model works with runai_streamer loader unless MTP is enabled.
  • With MTP, drafter load fails due to not finding weights in the local streamer cache (used by vLLM for the draft model).
  • The root cause seems to be vLLM pointing the loader at a local cache dir whose layout doesn't expose safetensors at the root, which list_safetensors() does not recurse into.
  • Possibly a vLLM cache layout or loader usage bug, or a need for recursive search in the loader for this scenario.

Would appreciate insight if this is a known bug or wrong configuration, or if a workaround exists.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: runai_streamer + MTP drafter fails to load weights from model_streamer local cache