vllm - ✅(Solved) Fix [Bug]: HFValidationError when loading model from cloud storage (s3://) with `HF_HUB_OFFLINE=1` [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39112Fetched 2026-04-08 03:01:52
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Timeline (top)
referenced ×2commented ×1cross-referenced ×1labeled ×1

Error Message

huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 's3://my-bucket/my-model' is not a valid repo ID.

Root Cause

EngineArgs.__post_init__ unconditionally calls get_model_path() for all model paths when HF_HUB_OFFLINE=1. get_model_path() forwards the path straight to huggingface_hub.snapshot_download(repo_id=...), which validates the string as a HF repo ID before even touching the cache — causing the crash.

This happens before ModelConfig is ever constructed, so vLLM's existing cloud-storage machinery (maybe_pull_model_tokenizer_for_runai) never gets a chance to run.

# vllm/engine/arg_utils.py  ← crash site
if huggingface_hub.constants.HF_HUB_OFFLINE:
    self.model = get_model_path(self.model, self.revision)
    #            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    # snapshot_download(repo_id="s3://...") → HFValidationError

Fix Action

Fixed

PR fix notes

PR #39155: [BugFix] HFValidationError with cloud storage URIs when HF_HUB_OFFLINE=1

Description (problem / solution / changelog)

Purpose

Fixes #39112

  • Skip cloud storage URIs (s3://, gs://, az://) in get_model_path() when HF_HUB_OFFLINE=1
  • Fix maybe_pull_model_tokenizer_for_runai() passing model URI instead of tokenizer URI to pull_files() when model and tokenizer are different cloud URIs

Test Plan

export HF_HUB_OFFLINE=1
vllm serve s3://my-bucket/my-model

Test Result

<details> <summary>before </summary>
INFO 04-07 14:59:24 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
(APIServer pid=877) INFO 04-07 14:59:25 [utils.py:299]
(APIServer pid=877) INFO 04-07 14:59:25 [utils.py:299]        █     █     █▄   ▄█
(APIServer pid=877) INFO 04-07 14:59:25 [utils.py:299]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.19.1rc1.dev65+g2f8a934e9
(APIServer pid=877) INFO 04-07 14:59:25 [utils.py:299]   █▄█▀ █     █     █     █  model   s3://my-bucket/my-model
(APIServer pid=877) INFO 04-07 14:59:25 [utils.py:299]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
(APIServer pid=877) INFO 04-07 14:59:25 [utils.py:299]
(APIServer pid=877) INFO 04-07 14:59:25 [utils.py:233] non-default args: {'model_tag': 's3://my-bucket/my-model', 'model': 's3://my-bucket/my-model'}
(APIServer pid=877) Traceback (most recent call last):
(APIServer pid=877)   File "/Users/name/Personal/vllm/.venv/bin/vllm", line 10, in <module>
(APIServer pid=877)     sys.exit(main())
(APIServer pid=877)              ^^^^^^
(APIServer pid=877)   File "/Users/name/Personal/vllm/vllm/entrypoints/cli/main.py", line 75, in main
(APIServer pid=877)     args.dispatch_function(args)
(APIServer pid=877)   File "/Users/name/Personal/vllm/vllm/entrypoints/cli/serve.py", line 122, in cmd
(APIServer pid=877)     uvloop.run(run_server(args))
(APIServer pid=877)   File "/Users/name/Personal/vllm/.venv/lib/python3.12/site-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=877)     return __asyncio.run(
(APIServer pid=877)            ^^^^^^^^^^^^^^
(APIServer pid=877)   File "/Users/name/.local/share/uv/python/cpython-3.12.13-macos-aarch64-none/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=877)     return runner.run(main)
(APIServer pid=877)            ^^^^^^^^^^^^^^^^
(APIServer pid=877)   File "/Users/name/.local/share/uv/python/cpython-3.12.13-macos-aarch64-none/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=877)     return self._loop.run_until_complete(task)
(APIServer pid=877)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=877)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=877)   File "/Users/name/Personal/vllm/.venv/lib/python3.12/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=877)     return await main
(APIServer pid=877)            ^^^^^^^^^^
(APIServer pid=877)   File "/Users/name/Personal/vllm/vllm/entrypoints/openai/api_server.py", line 684, in run_server
(APIServer pid=877)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=877)   File "/Users/name/Personal/vllm/vllm/entrypoints/openai/api_server.py", line 698, in run_server_worker
(APIServer pid=877)     async with build_async_engine_client(
(APIServer pid=877)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=877)   File "/Users/name/.local/share/uv/python/cpython-3.12.13-macos-aarch64-none/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=877)     return await anext(self.gen)
(APIServer pid=877)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=877)   File "/Users/name/Personal/vllm/vllm/entrypoints/openai/api_server.py", line 95, in build_async_engine_client
(APIServer pid=877)     engine_args = AsyncEngineArgs.from_cli_args(args)
(APIServer pid=877)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=877)   File "/Users/name/Personal/vllm/vllm/engine/arg_utils.py", line 1403, in from_cli_args
(APIServer pid=877)     engine_args = cls(
(APIServer pid=877)                   ^^^^
(APIServer pid=877)   File "<string>", line 193, in __init__
(APIServer pid=877)   File "/Users/name/Personal/vllm/vllm/engine/arg_utils.py", line 679, in __post_init__
(APIServer pid=877)     self.model = get_model_path(self.model, self.revision)
(APIServer pid=877)                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=877)   File "/Users/name/Personal/vllm/vllm/transformers_utils/repo_utils.py", line 220, in get_model_path
(APIServer pid=877)     return snapshot_download(repo_id=model, **common_kwargs)
(APIServer pid=877)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=877)   File "/Users/name/Personal/vllm/.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 106, in _inner_fn
(APIServer pid=877)     validate_repo_id(arg_value)
(APIServer pid=877)   File "/Users/name/Personal/vllm/.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 154, in validate_repo_id
(APIServer pid=877)     raise HFValidationError(
(APIServer pid=877) huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': 's3://my-bucket/my-model'. Use `repo_type` argument if needed.
</details> <details> <summary>after </summary>
INFO 04-07 15:01:29 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available.
(APIServer pid=2586) INFO 04-07 15:01:30 [utils.py:299]
(APIServer pid=2586) INFO 04-07 15:01:30 [utils.py:299]        █     █     █▄   ▄█
(APIServer pid=2586) INFO 04-07 15:01:30 [utils.py:299]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.19.1rc1.dev65+g2f8a934e9
(APIServer pid=2586) INFO 04-07 15:01:30 [utils.py:299]   █▄█▀ █     █     █     █  model   s3://my-bucket/my-model
(APIServer pid=2586) INFO 04-07 15:01:30 [utils.py:299]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
(APIServer pid=2586) INFO 04-07 15:01:30 [utils.py:299]
(APIServer pid=2586) INFO 04-07 15:01:30 [utils.py:233] non-default args: {'model_tag': 's3://my-bucket/my-model', 'model': 's3://my-bucket/my-model'}
(APIServer pid=2586) Traceback (most recent call last):
(APIServer pid=2586)   File "/Users/name/Personal/vllm/.venv/bin/vllm", line 10, in <module>
(APIServer pid=2586)     sys.exit(main())
(APIServer pid=2586)              ^^^^^^
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/entrypoints/cli/main.py", line 75, in main
(APIServer pid=2586)     args.dispatch_function(args)
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/entrypoints/cli/serve.py", line 122, in cmd
(APIServer pid=2586)     uvloop.run(run_server(args))
(APIServer pid=2586)   File "/Users/name/Personal/vllm/.venv/lib/python3.12/site-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=2586)     return __asyncio.run(
(APIServer pid=2586)            ^^^^^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/.local/share/uv/python/cpython-3.12.13-macos-aarch64-none/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=2586)     return runner.run(main)
(APIServer pid=2586)            ^^^^^^^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/.local/share/uv/python/cpython-3.12.13-macos-aarch64-none/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=2586)     return self._loop.run_until_complete(task)
(APIServer pid=2586)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2586)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=2586)   File "/Users/name/Personal/vllm/.venv/lib/python3.12/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=2586)     return await main
(APIServer pid=2586)            ^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/entrypoints/openai/api_server.py", line 684, in run_server
(APIServer pid=2586)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/entrypoints/openai/api_server.py", line 698, in run_server_worker
(APIServer pid=2586)     async with build_async_engine_client(
(APIServer pid=2586)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/.local/share/uv/python/cpython-3.12.13-macos-aarch64-none/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=2586)     return await anext(self.gen)
(APIServer pid=2586)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
(APIServer pid=2586)     async with build_async_engine_client_from_engine_args(
(APIServer pid=2586)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/.local/share/uv/python/cpython-3.12.13-macos-aarch64-none/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=2586)     return await anext(self.gen)
(APIServer pid=2586)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/entrypoints/openai/api_server.py", line 124, in build_async_engine_client_from_engine_args
(APIServer pid=2586)     vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=2586)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/engine/arg_utils.py", line 1581, in create_engine_config
(APIServer pid=2586)     model_config = self.create_model_config()
(APIServer pid=2586)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/engine/arg_utils.py", line 1429, in create_model_config
(APIServer pid=2586)     return ModelConfig(
(APIServer pid=2586)            ^^^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/Personal/vllm/.venv/lib/python3.12/site-packages/pydantic/_internal/_dataclasses.py", line 121, in __init__
(APIServer pid=2586)     s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/config/model.py", line 485, in __post_init__
(APIServer pid=2586)     self.maybe_pull_model_tokenizer_for_runai(self.model, self.tokenizer)
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/config/model.py", line 797, in maybe_pull_model_tokenizer_for_runai
(APIServer pid=2586)     object_storage_model.pull_files(
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/transformers_utils/runai_utils.py", line 100, in pull_files
(APIServer pid=2586)     runai_pull_files(model_path, self.dir, allow_pattern, ignore_pattern)
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/utils/import_utils.py", line 179, in __call__
(APIServer pid=2586)     return self.__getattr__("__call__")
(APIServer pid=2586)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/utils/import_utils.py", line 335, in __getattr__
(APIServer pid=2586)     getattr(self.__module, f"{self.__attr_path}.{key}")
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/utils/import_utils.py", line 315, in __getattr__
(APIServer pid=2586)     raise exc
(APIServer pid=2586)   File "/Users/name/Personal/vllm/vllm/utils/import_utils.py", line 308, in __getattr__
(APIServer pid=2586)     importlib.import_module(name)
(APIServer pid=2586)   File "/Users/name/.local/share/uv/python/cpython-3.12.13-macos-aarch64-none/lib/python3.12/importlib/__init__.py", line 90, in import_module
(APIServer pid=2586)     return _bootstrap._gcd_import(name[level:], package, level)
(APIServer pid=2586)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=2586)   File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
(APIServer pid=2586)   File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
(APIServer pid=2586)   File "<frozen importlib._bootstrap>", line 1324, in _find_and_load_unlocked
(APIServer pid=2586) ModuleNotFoundError: No module named 'runai_model_streamer'
</details>
<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • tests/engine/test_arg_utils.py (modified, +30/-0)
  • tests/test_config.py (modified, +22/-0)
  • vllm/config/model.py (modified, +1/-1)
  • vllm/engine/arg_utils.py (modified, +16/-9)

Code Example

huggingface_hub.errors.HFValidationError: Repo id must be in the form
  'repo_name' or 'namespace/repo_name': 's3://my-bucket/my-model' is not a valid repo ID.

---

export HF_HUB_OFFLINE=1
vllm serve s3://my-bucket/my-model

---

# vllm/engine/arg_utils.py  ← crash site
if huggingface_hub.constants.HF_HUB_OFFLINE:
    self.model = get_model_path(self.model, self.revision)
    #            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    # snapshot_download(repo_id="s3://...")HFValidationError

---

export HF_HUB_OFFLINE=1
export TRANSFORMERS_OFFLINE=1
vllm serve s3://my-bucket/my-model
# → config/tokenizer files pulled from S3 via runai_model_streamer
# → weights streamed from S3 directly
# → no HF Hub calls made

---

# vllm/config/model.py  ← wrong argument
if is_runai_obj_uri(tokenizer):
    object_storage_tokenizer = ObjectStorageModel(url=tokenizer)
    object_storage_tokenizer.pull_files(
        model,      # ← bug: should be `tokenizer`
        ignore_pattern=[...]
    )
RAW_BUFFERClick to expand / collapse

Your current environment

  • vLLM version: 0.19.x (and earlier)
  • runai_model_streamer installed

🐛 Describe the bug

When HF_HUB_OFFLINE=1 is set and the model path is a cloud storage URI (s3://), vLLM raises an HFValidationError immediately on startup before any model loading begins:

huggingface_hub.errors.HFValidationError: Repo id must be in the form
  'repo_name' or 'namespace/repo_name': 's3://my-bucket/my-model' is not a valid repo ID.

📋 Steps to reproduce

export HF_HUB_OFFLINE=1
vllm serve s3://my-bucket/my-model

Root cause

EngineArgs.__post_init__ unconditionally calls get_model_path() for all model paths when HF_HUB_OFFLINE=1. get_model_path() forwards the path straight to huggingface_hub.snapshot_download(repo_id=...), which validates the string as a HF repo ID before even touching the cache — causing the crash.

This happens before ModelConfig is ever constructed, so vLLM's existing cloud-storage machinery (maybe_pull_model_tokenizer_for_runai) never gets a chance to run.

# vllm/engine/arg_utils.py  ← crash site
if huggingface_hub.constants.HF_HUB_OFFLINE:
    self.model = get_model_path(self.model, self.revision)
    #            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    # snapshot_download(repo_id="s3://...") → HFValidationError

Why HF_HUB_OFFLINE=1 is the right setup for S3 models

Users loading weights from S3 set HF_HUB_OFFLINE=1 precisely because they don't want vLLM touching the HF Hub — the model lives in object storage, not there. The flag shouldn't interfere with cloud URI resolution.

Expected behavior

export HF_HUB_OFFLINE=1
export TRANSFORMERS_OFFLINE=1
vllm serve s3://my-bucket/my-model
# → config/tokenizer files pulled from S3 via runai_model_streamer
# → weights streamed from S3 directly
# → no HF Hub calls made

Additional bug found

A secondary bug exists in ModelConfig.maybe_pull_model_tokenizer_for_runai: when model and tokenizer are different cloud URIs, the tokenizer pull_files() call incorrectly passes the model URI instead of the tokenizer URI.

# vllm/config/model.py  ← wrong argument
if is_runai_obj_uri(tokenizer):
    object_storage_tokenizer = ObjectStorageModel(url=tokenizer)
    object_storage_tokenizer.pull_files(
        model,      # ← bug: should be `tokenizer`
        ignore_pattern=[...]
    )

Related issues

  • #12437 — HFValidationError when loading model from S3
  • #23684 — HF_HUB_OFFLINE + local path regression (introduced get_model_path call)
  • #23236, #24313, #26600 — S3 / RunAI breakage across minor vLLM releases

extent analysis

TL;DR

To fix the bug, modify the EngineArgs.__post_init__ method to conditionally call get_model_path() only when the model path is not a cloud storage URI.

Guidance

  • Check if the model path starts with a cloud storage URI (e.g., s3://) before calling get_model_path() to avoid validating it as a HF repo ID.
  • Update the ModelConfig.maybe_pull_model_tokenizer_for_runai method to pass the correct URI (tokenizer URI) to the pull_files() call.
  • Verify that the fix works by setting HF_HUB_OFFLINE=1 and serving a model from a cloud storage URI (e.g., s3://my-bucket/my-model).
  • Test the fix with different cloud storage URIs and model/tokenizer combinations to ensure it works as expected.

Example

# vllm/engine/arg_utils.py
if huggingface_hub.constants.HF_HUB_OFFLINE and not self.model.startswith('s3://'):
    self.model = get_model_path(self.model, self.revision)

Notes

The fix assumes that the get_model_path() function is not necessary for cloud storage URIs. If this is not the case, additional modifications may be required.

Recommendation

Apply the workaround by modifying the EngineArgs.__post_init__ method and updating the ModelConfig.maybe_pull_model_tokenizer_for_runai method, as this will allow users to load models from cloud storage URIs without interfering with the HF Hub.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

export HF_HUB_OFFLINE=1
export TRANSFORMERS_OFFLINE=1
vllm serve s3://my-bucket/my-model
# → config/tokenizer files pulled from S3 via runai_model_streamer
# → weights streamed from S3 directly
# → no HF Hub calls made

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Bug]: HFValidationError when loading model from cloud storage (s3://) with `HF_HUB_OFFLINE=1` [1 pull requests, 1 comments, 2 participants]