vllm - ✅(Solved) Fix [Bug]: HFValidationError when loading model from cloud storage (s3://) with `HF_HUB_OFFLINE=1` [1 pull requests, 1 comments, 2 participants]

Q: Expected behavior

```bash export HF_HUB_OFFLINE=1 export TRANSFORMERS_OFFLINE=1 vllm serve s3://my-bucket/my-model # → config/tokenizer files pulled from S3 via runai_model_streamer # → weights streamed from S3 directly # → no HF Hub calls made ```

vllm2026-04-06 21:05:21

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#39112•Fetched 2026-04-08 03:01:52

View on GitHub

Comments

Participants

Timeline

Reactions

Author

JustinPerlman

Participants

JustinPerlman

vadimkantorov

Timeline (top)

referenced ×2commented ×1cross-referenced ×1labeled ×1

Error Message

huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 's3://my-bucket/my-model' is not a valid repo ID.

Root Cause

EngineArgs.__post_init__ unconditionally calls get_model_path() for all model paths when HF_HUB_OFFLINE=1. get_model_path() forwards the path straight to huggingface_hub.snapshot_download(repo_id=...), which validates the string as a HF repo ID before even touching the cache — causing the crash.

This happens before ModelConfig is ever constructed, so vLLM's existing cloud-storage machinery (maybe_pull_model_tokenizer_for_runai) never gets a chance to run.

# vllm/engine/arg_utils.py  ← crash site
if huggingface_hub.constants.HF_HUB_OFFLINE:
    self.model = get_model_path(self.model, self.revision)
    #            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    # snapshot_download(repo_id="s3://...") → HFValidationError

Fix Action

Fixed

Fixed by PR: [BugFix] HFValidationError with cloud storage URIs when HF_HUB_OFFLINE=1 (https://github.com/vllm-project/vllm/pull/39155)

PR fix notes

PR #39155: [BugFix] HFValidationError with cloud storage URIs when HF_HUB_OFFLINE=1

Repository: vllm-project/vllm
Author: sts07142
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/39155

Description (problem / solution / changelog)

Purpose

Fixes #39112

Skip cloud storage URIs (s3://, gs://, az://) in get_model_path() when HF_HUB_OFFLINE=1
Fix maybe_pull_model_tokenizer_for_runai() passing model URI instead of tokenizer URI to pull_files() when model and tokenizer are different cloud URIs

Test Plan

export HF_HUB_OFFLINE=1
vllm serve s3://my-bucket/my-model

Test Result

<details> <summary>before </summary>

INFO 04-07 14:59:24 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
(APIServer pid=877) INFO 04-07 14:59:25 [utils.py:299]
(APIServer pid=877) INFO 04-07 14:59:25 [utils.py:299]        █     █     █▄   ▄█
(APIServer pid=877) INFO 04-07 14:59:25 [utils.py:299]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.19.1rc1.dev65+g2f8a934e9
(APIServer pid=877) INFO 04-07 14:59:25 [utils.py:299]   █▄█▀ █     █     █     █  model   s3://my-bucket/my-model
(APIServer pid=877) INFO 04-07 14:59:25 [utils.py:299]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
(APIServer pid=877) INFO 04-07 14:59:25 [utils.py:299]
(APIServer pid=877) INFO 04-07 14:59:25 [utils.py:233] non-default args: {'model_tag': 's3://my-bucket/my-model', 'model': 's3://my-bucket/my-model'}
(APIServer pid=877) Traceback (most recent call last):
(APIServer pid=877)   File "/Users/name/Personal/vllm/.venv/bin/vllm", line 10, in <module>
(APIServer pid=877)     sys.exit(main())
(APIServer pid=877)              ^^^^^^
(APIServer pid=877)   File "/Users/name/Personal/vllm/vllm/entrypoints/cli/main.py", line 75, in main
(APIServer pid=877)     args.dispatch_function(args)
(APIServer pid=877)   File "/Users/name/Personal/vllm/vllm/entrypoints/cli/serve.py", line 122, in cmd
(APIServer pid=877)     uvloop.run(run_server(args))
(APIServer pid=877)   File "/Users/name/Personal/vllm/.venv/lib/python3.12/site-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=877)     return __asyncio.run(
(APIServer pid=877)            ^^^^^^^^^^^^^^
(APIServer pid=877)   File "/Users/name/.local/share/uv/python/cpython-3.12.13-macos-aarch64-none/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=877)     return runner.run(main)
(APIServer pid=877)            ^^^^^^^^^^^^^^^^
(APIServer pid=877)   File "/Users/name/.local/share/uv/python/cpython-3.12.13-macos-aarch64-none/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=877)     return self._loop.run_until_complete(task)
(APIServer pid=877)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=877)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=877)   File "/Users/name/Personal/vllm/.venv/lib/python3.12/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=877)     return await main
(APIServer pid=877)            ^^^^^^^^^^
(APIServer pid=877)   File "/Users/name/Personal/vllm/vllm/entrypoints/openai/api_server.py", line 684, in run_server
(APIServer pid=877)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=877)   File "/Users/name/Personal/vllm/vllm/entrypoints/openai/api_server.py", line 698, in run_server_worker
(APIServer pid=877)     async with build_async_engine_client(
(APIServer pid=877)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=877)   File "/Users/name/.local/share/uv/python/cpython-3.12.13-macos-aarch64-none/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=877)     return await anext(self.gen)
(APIServer pid=877)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=877)   File "/Users/name/Personal/vllm/vllm/entrypoints/openai/api_server.py", line 95, in build_async_engine_client
(APIServer pid=877)     engine_args = AsyncEngineArgs.from_cli_args(args)
(APIServer pid=877)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=877)   File "/Users/name/Personal/vllm/vllm/engine/arg_utils.py", line 1403, in from_cli_args
(APIServer pid=877)     engine_args = cls(
(APIServer pid=877)                   ^^^^
(APIServer pid=877)   File "<string>", line 193, in __init__
(APIServer pid=877)   File "/Users/name/Personal/vllm/vllm/engine/arg_utils.py", line 679, in __post_init__
(APIServer pid=877)     self.model = get_model_path(self.model, self.revision)
(APIServer pid=877)                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=877)   File "/Users/name/Personal/vllm/vllm/transformers_utils/repo_utils.py", line 220, in get_model_path
(APIServer pid=877)     return snapshot_download(repo_id=model, **common_kwargs)
(APIServer pid=877)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=877)   File "/Users/name/Personal/vllm/.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 106, in _inner_fn
(APIServer pid=877)     validate_repo_id(arg_value)
(APIServer pid=877)   File "/Users/name/Personal/vllm/.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 154, in validate_repo_id
(APIServer pid=877)     raise HFValidationError(
(APIServer pid=877) huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 's3://my-bucket/my-model'. Use `repo_type` argument if needed.

</details> <details> <summary>after </summary>

INFO 04-07 15:01:29 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
(APIServer pid=2586) INFO 04-07 15:01:30 [utils.py:299]
(APIServer pid=2586) INFO 04-07 15:01:30 [utils.py:299]        █     █     █▄   ▄█
(APIServer pid=2586) INFO 04-07 15:01:30 [utils.py:299]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.19.1rc1.dev65+g2f8a934e9
(APIServer pid=2586) INFO 04-07 15:01:30 [utils.py:299]   █▄█▀ █     █     █     █  model   s3://my-bucket/my-model
(APIServer pid=2586) INFO 04-07 15:01:30 [utils.py:299]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
(APIServer pid=2586) INFO 04-07 15:01:30 [utils.py:299]
(APIServer pid=2586) INFO 04-07 15:01:30 [utils.py:233] non-default args: {'model_tag': 's3://my-bucket/my-model', 'model': 's3://my-bucket/my-model'}
(APIServer pid=2586) Traceback (most recent call last):
(APIServer pid=2586)   File "/Users/name/Personal/vllm/.venv/bin/vllm", line 10, in <module>
(APIServer pid=2586)     sys.exit(main())
(APIServer pid=2586)              ^^^^^^
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/entrypoints/cli/main.py", line 75, in main
(APIServer pid=2586)     args.dispatch_function(args)
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/entrypoints/cli/serve.py", line 122, in cmd
(APIServer pid=2586)     uvloop.run(run_server(args))
(APIServer pid=2586)   File "/Users/name/Personal/vllm/.venv/lib/python3.12/site-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=2586)     return __asyncio.run(
(APIServer pid=2586)            ^^^^^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/.local/share/uv/python/cpython-3.12.13-macos-aarch64-none/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=2586)     return runner.run(main)
(APIServer pid=2586)            ^^^^^^^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/.local/share/uv/python/cpython-3.12.13-macos-aarch64-none/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=2586)     return self._loop.run_until_complete(task)
(APIServer pid=2586)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2586)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=2586)   File "/Users/name/Personal/vllm/.venv/lib/python3.12/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=2586)     return await main
(APIServer pid=2586)            ^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/entrypoints/openai/api_server.py", line 684, in run_server
(APIServer pid=2586)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/entrypoints/openai/api_server.py", line 698, in run_server_worker
(APIServer pid=2586)     async with build_async_engine_client(
(APIServer pid=2586)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/.local/share/uv/python/cpython-3.12.13-macos-aarch64-none/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=2586)     return await anext(self.gen)
(APIServer pid=2586)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
(APIServer pid=2586)     async with build_async_engine_client_from_engine_args(
(APIServer pid=2586)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/.local/share/uv/python/cpython-3.12.13-macos-aarch64-none/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=2586)     return await anext(self.gen)
(APIServer pid=2586)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/entrypoints/openai/api_server.py", line 124, in build_async_engine_client_from_engine_args
(APIServer pid=2586)     vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=2586)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/engine/arg_utils.py", line 1581, in create_engine_config
(APIServer pid=2586)     model_config = self.create_model_config()
(APIServer pid=2586)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/engine/arg_utils.py", line 1429, in create_model_config
(APIServer pid=2586)     return ModelConfig(
(APIServer pid=2586)            ^^^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/Personal/vllm/.venv/lib/python3.12/site-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
(APIServer pid=2586)     s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/config/model.py", line 485, in __post_init__
(APIServer pid=2586)     self.maybe_pull_model_tokenizer_for_runai(self.model, self.tokenizer)
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/config/model.py", line 797, in maybe_pull_model_tokenizer_for_runai
(APIServer pid=2586)     object_storage_model.pull_files(
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/transformers_utils/runai_utils.py", line 100, in pull_files
(APIServer pid=2586)     runai_pull_files(model_path, self.dir, allow_pattern, ignore_pattern)
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/utils/import_utils.py", line 179, in __call__
(APIServer pid=2586)     return self.__getattr__("__call__")
(APIServer pid=2586)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/utils/import_utils.py", line 335, in __getattr__
(APIServer pid=2586)     getattr(self.__module, f"{self.__attr_path}.{key}")
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/utils/import_utils.py", line 315, in __getattr__
(APIServer pid=2586)     raise exc
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/utils/import_utils.py", line 308, in __getattr__
(APIServer pid=2586)     importlib.import_module(name)
(APIServer pid=2586)   File "/Users/name/.local/share/uv/python/cpython-3.12.13-macos-aarch64-none/lib/python3.12/importlib/__init__.py", line 90, in import_module
(APIServer pid=2586)     return _bootstrap._gcd_import(name[level:], package, level)
(APIServer pid=2586)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2586)   File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
(APIServer pid=2586)   File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
(APIServer pid=2586)   File "<frozen importlib._bootstrap>", line 1324, in _find_and_load_unlocked
(APIServer pid=2586) ModuleNotFoundError: No module named 'runai_model_streamer'

</details>

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

</details>

Changed files

tests/engine/test_arg_utils.py (modified, +30/-0)
tests/test_config.py (modified, +22/-0)
vllm/config/model.py (modified, +1/-1)
vllm/engine/arg_utils.py (modified, +16/-9)

Code Example

huggingface_hub.errors.HFValidationError: Repo id must be in the form
  'repo_name' or 'namespace/repo_name': 's3://my-bucket/my-model' is not a valid repo ID.

---

export HF_HUB_OFFLINE=1
vllm serve s3://my-bucket/my-model

---

# vllm/engine/arg_utils.py  ← crash site
if huggingface_hub.constants.HF_HUB_OFFLINE:
    self.model = get_model_path(self.model, self.revision)
    #            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    # snapshot_download(repo_id="s3://...") → HFValidationError

---

export HF_HUB_OFFLINE=1
export TRANSFORMERS_OFFLINE=1
vllm serve s3://my-bucket/my-model
# → config/tokenizer files pulled from S3 via runai_model_streamer
# → weights streamed from S3 directly
# → no HF Hub calls made

---

# vllm/config/model.py  ← wrong argument
if is_runai_obj_uri(tokenizer):
    object_storage_tokenizer = ObjectStorageModel(url=tokenizer)
    object_storage_tokenizer.pull_files(
        model,      # ← bug: should be `tokenizer`
        ignore_pattern=[...]
    )

RAW_BUFFERClick to expand / collapse

Your current environment

vLLM version: 0.19.x (and earlier)
runai_model_streamer installed

🐛 Describe the bug

When HF_HUB_OFFLINE=1 is set and the model path is a cloud storage URI (s3://), vLLM raises an HFValidationError immediately on startup before any model loading begins:

huggingface_hub.errors.HFValidationError: Repo id must be in the form
  'repo_name' or 'namespace/repo_name': 's3://my-bucket/my-model' is not a valid repo ID.

📋 Steps to reproduce

export HF_HUB_OFFLINE=1
vllm serve s3://my-bucket/my-model

Root cause

This happens before ModelConfig is ever constructed, so vLLM's existing cloud-storage machinery (maybe_pull_model_tokenizer_for_runai) never gets a chance to run.

# vllm/engine/arg_utils.py  ← crash site
if huggingface_hub.constants.HF_HUB_OFFLINE:
    self.model = get_model_path(self.model, self.revision)
    #            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    # snapshot_download(repo_id="s3://...") → HFValidationError

Why HF_HUB_OFFLINE=1 is the right setup for S3 models

Users loading weights from S3 set HF_HUB_OFFLINE=1 precisely because they don't want vLLM touching the HF Hub — the model lives in object storage, not there. The flag shouldn't interfere with cloud URI resolution.

Expected behavior

export HF_HUB_OFFLINE=1
export TRANSFORMERS_OFFLINE=1
vllm serve s3://my-bucket/my-model
# → config/tokenizer files pulled from S3 via runai_model_streamer
# → weights streamed from S3 directly
# → no HF Hub calls made

Additional bug found

A secondary bug exists in ModelConfig.maybe_pull_model_tokenizer_for_runai: when model and tokenizer are different cloud URIs, the tokenizer pull_files() call incorrectly passes the model URI instead of the tokenizer URI.

# vllm/config/model.py  ← wrong argument
if is_runai_obj_uri(tokenizer):
    object_storage_tokenizer = ObjectStorageModel(url=tokenizer)
    object_storage_tokenizer.pull_files(
        model,      # ← bug: should be `tokenizer`
        ignore_pattern=[...]
    )

Related issues

#12437 — HFValidationError when loading model from S3
#23684 — HF_HUB_OFFLINE + local path regression (introduced get_model_path call)
#23236, #24313, #26600 — S3 / RunAI breakage across minor vLLM releases

extent analysis

TL;DR

To fix the bug, modify the EngineArgs.__post_init__ method to conditionally call get_model_path() only when the model path is not a cloud storage URI.

Guidance

Check if the model path starts with a cloud storage URI (e.g., s3://) before calling get_model_path() to avoid validating it as a HF repo ID.
Update the ModelConfig.maybe_pull_model_tokenizer_for_runai method to pass the correct URI (tokenizer URI) to the pull_files() call.
Verify that the fix works by setting HF_HUB_OFFLINE=1 and serving a model from a cloud storage URI (e.g., s3://my-bucket/my-model).
Test the fix with different cloud storage URIs and model/tokenizer combinations to ensure it works as expected.

Example

# vllm/engine/arg_utils.py
if huggingface_hub.constants.HF_HUB_OFFLINE and not self.model.startswith('s3://'):
    self.model = get_model_path(self.model, self.revision)

Notes

The fix assumes that the get_model_path() function is not necessary for cloud storage URIs. If this is not the case, additional modifications may be required.

Recommendation

Apply the workaround by modifying the EngineArgs.__post_init__ method and updating the ModelConfig.maybe_pull_model_tokenizer_for_runai method, as this will allow users to load models from cloud storage URIs without interfering with the HF Hub.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

export HF_HUB_OFFLINE=1
export TRANSFORMERS_OFFLINE=1
vllm serve s3://my-bucket/my-model
# → config/tokenizer files pulled from S3 via runai_model_streamer
# → weights streamed from S3 directly
# → no HF Hub calls made

#model loading #parallel task #integration issue #index setup #retrieval issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

vllm - ✅(Solved) Fix [Bug]: HFValidationError when loading model from cloud storage (s3://) with `HF_HUB_OFFLINE=1` [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #39155: [BugFix] HFValidationError with cloud storage URIs when HF_HUB_OFFLINE=1

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

Code Example

Your current environment

🐛 Describe the bug

📋 Steps to reproduce

Root cause

Why HF_HUB_OFFLINE=1 is the right setup for S3 models

Expected behavior

Additional bug found

Related issues

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING