vllm - 💡(How to fix) Fix [Usage]: We are using vLLM version 0.19.1. When attempting to run DeepSeek-V4-Flash with a 32k context window across eight RTX 4090 GPUs, we encountered an error indicating that the `transformers` library needed to be updated. We then updated the library using the command `uv pip install --no-cache-dir git+https://github.com/huggingface/transformers.git`, but the error persisted as shown below: [1 comments, 2 participants]

vllm2026-04-27 02:19:29

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#40954•Fetched 2026-04-28 06:26:13

View on GitHub

Comments

Participants

Timeline

Reactions

Author

wuxiaohui0

Participants

bash99

wuxiaohui0

Timeline (top)

commented ×1labeled ×1renamed ×1

Error Message

(APIServer pid=55338) Traceback (most recent call last): (APIServer pid=55338) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig (APIServer pid=55338) Value error, The checkpoint you are trying to load has model type deepseek_v4 but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

Root Cause

(vllm) root@e81911163178:~/xinglin-data/vllm# vllm serve /root/xinglin-data/api_model/DeepSeek-V4-Flash --trust-remote-code --tensor-parallel-size 8 --enable-expert-parallel --max-model-len 32768 --gpu-memory-utilization 0.90 --dtype auto --api-key "sk-xingluan.cn" --host 0.0.0.0 --port 12800 (APIServer pid=55338) INFO 04-27 09:48:21 [utils.py:299] (APIServer pid=55338) INFO 04-27 09:48:21 [utils.py:299] █ █ █▄ ▄█ (APIServer pid=55338) INFO 04-27 09:48:21 [utils.py:299] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.19.1 (APIServer pid=55338) INFO 04-27 09:48:21 [utils.py:299] █▄█▀ █ █ █ █ model /root/xinglin-data/api_model/DeepSeek-V4-Flash (APIServer pid=55338) INFO 04-27 09:48:21 [utils.py:299] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀ (APIServer pid=55338) INFO 04-27 09:48:21 [utils.py:299] (APIServer pid=55338) INFO 04-27 09:48:21 [utils.py:233] non-default args: {'model_tag': '/root/xinglin-data/api_model/DeepSeek-V4-Flash', 'host': '0.0.0.0', 'port': 12800, 'api_key': ['sk-xingluan.cn'], 'model': '/root/xinglin-data/api_model/DeepSeek-V4-Flash', 'trust_remote_code': True, 'max_model_len': 32768, 'tensor_parallel_size': 8, 'enable_expert_parallel': True} (APIServer pid=55338) Traceback (most recent call last): (APIServer pid=55338) File "/miniconda/envs/vllm/bin/vllm", line 10, in <module> (APIServer pid=55338) sys.exit(main()) (APIServer pid=55338) ^^^^^^ (APIServer pid=55338) File "/miniconda/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 75, in main (APIServer pid=55338) args.dispatch_function(args) (APIServer pid=55338) File "/miniconda/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd (APIServer pid=55338) uvloop.run(run_server(args)) (APIServer pid=55338) File "/miniconda/envs/vllm/lib/python3.12/site-packages/uvloop/init.py", line 96, in run (APIServer pid=55338) return __asyncio.run( (APIServer pid=55338) ^^^^^^^^^^^^^^ (APIServer pid=55338) File "/miniconda/envs/vllm/lib/python3.12/asyncio/runners.py", line 195, in run (APIServer pid=55338) return runner.run(main) (APIServer pid=55338) ^^^^^^^^^^^^^^^^ (APIServer pid=55338) File "/miniconda/envs/vllm/lib/python3.12/asyncio/runners.py", line 118, in run (APIServer pid=55338) return self._loop.run_until_complete(task) (APIServer pid=55338) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=55338) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=55338) File "/miniconda/envs/vllm/lib/python3.12/site-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=55338) return await main (APIServer pid=55338) ^^^^^^^^^^ (APIServer pid=55338) File "/miniconda/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 672, in run_server (APIServer pid=55338) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=55338) File "/miniconda/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 686, in run_server_worker (APIServer pid=55338) async with build_async_engine_client( (APIServer pid=55338) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=55338) File "/miniconda/envs/vllm/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=55338) return await anext(self.gen) (APIServer pid=55338) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=55338) File "/miniconda/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client (APIServer pid=55338) async with build_async_engine_client_from_engine_args( (APIServer pid=55338) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=55338) File "/miniconda/envs/vllm/lib/python3.12/contextlib.py", line 210, in aenter (APIServer pid=55338) return await anext(self.gen) (APIServer pid=55338) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=55338) File "/miniconda/envs/vllm/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 124, in build_async_engine_client_from_engine_args (APIServer pid=55338) vllm_config = engine_args.create_engine_config(usage_context=usage_context) (APIServer pid=55338) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=55338) File "/miniconda/envs/vllm/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 1549, in create_engine_config (APIServer pid=55338) model_config = self.create_model_config() (APIServer pid=55338) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=55338) File "/miniconda/envs/vllm/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 1398, in create_model_config (APIServer pid=55338) return ModelConfig( (APIServer pid=55338) ^^^^^^^^^^^^ (APIServer pid=55338) File "/miniconda/envs/vllm/lib/python3.12/site-packages/pydantic/_internal/_dataclasses.py", line 121, in init (APIServer pid=55338) s.pydantic_validator.validate_python(ArgsKwargs(args, kwargs), self_instance=s) (APIServer pid=55338) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig (APIServer pid=55338) Value error, The checkpoint you are trying to load has model type deepseek_v4 but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date. (APIServer pid=55338) (APIServer pid=55338) You can update Transformers with the command pip install --upgrade transformers. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command pip install git+https://github.com/huggingface/transformers.git [type=value_error, input_value=ArgsKwargs((), {'model': ...nderer_num_workers': 1}), input_type=ArgsKwargs] (APIServer pid=55338) For further information visit https://errors.pydantic.dev/2.13/v/value_error

Fix Action

Fix / Workaround

RAW_BUFFERClick to expand / collapse

Your current environment

How would you like to use vllm

I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The most likely fix is to update the Transformers library to the latest version, as the current version does not recognize the deepseek_v4 model architecture.

Guidance

Update Transformers using pip install --upgrade transformers to ensure you have the latest version that may support the deepseek_v4 model.
If updating Transformers does not work, consider installing Transformers from source with pip install git+https://github.com/huggingface/transformers.git for the most up-to-date code.
Verify that the model you are trying to load is correctly specified and that the checkpoint is not corrupted.
Check the documentation and FAQs for any specific instructions on integrating the model with vllm.

Example

No code snippet is provided as the issue seems to be related to library versions and model compatibility rather than code syntax.

Notes

The error message suggests that the issue is due to the version of Transformers being out of date, which does not recognize the deepseek_v4 model architecture. Updating or installing from source should resolve this issue.

Recommendation

Apply the workaround by updating Transformers to the latest version or installing from source, as this is likely to resolve the model compatibility issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #installation #tensor shape #autograd error #model save/load

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Your current environment

How would you like to use vllm

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Your current environment

How would you like to use vllm

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING