vllm - 💡(How to fix) Fix [Bug]: max-num-partial-prefills failes on V1 engine start [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39737Fetched 2026-04-15 06:20:40
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
labeled ×1

Error Message

vllm serve /tmp/model-poc --host 127.0.0.1 --port 8000 --gpu-memory-utilization 0.90 --max-model-len 1000 --dtype float16 --kv-cache-dtype auto --tensor-parallel-size 1 --pipeline-parallel-size 1 --max-num-batched-tokens 4096 --max-num-seqs 128 --max-num-partial-prefills 4 --max-long-partial-prefills 1 --long-prefill-token-threshold 256 --performance-mode interactivity

2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299] 2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299] █ █ █▄ ▄█ 2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.19.0 2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299] █▄█▀ █ █ █ █ model /tmp/model-poc 2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀ 2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299] 2026-04-13 21:12:29,531 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:233] non-default args: {'model_tag': '/tmp/model-load-testing-poc/model-poc', 'host': '127.0.0.1', 'model': '/tmp/model-poc', 'dtype': 'float16', 'max_model_len': 1000, 'max_num_batched_tokens': 4096, 'max_num_seqs': 128, 'max_num_partial_prefills': 4, 'long_prefill_token_threshold': 256, 'performance_mode': 'interactivity'} 2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [model.py:549] Resolved architecture: Qwen3ForCausalLM 2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) WARNING 04-13 21:12:29 [model.py:2016] Casting torch.bfloat16 to torch.float16. 2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [model.py:1678] Using max model len 1000 2026-04-13 21:12:29,541 INFO [server] (APIServer pid=6907) Traceback (most recent call last): 2026-04-13 21:12:29,541 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/bin/vllm", line 6, in <module> 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) sys.exit(main()) 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) ^^^^^^ 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 75, in main 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) args.dispatch_function(args) 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) uvloop.run(run_server(args)) 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/uvloop/init.py", line 92, in run 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) return runner.run(wrapper()) 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) ^^^^^^^^^^^^^^^^^^^^^ 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/asyncio/runners.py", line 118, in run 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) return self._loop.run_until_complete(task) 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/uvloop/init.py", line 48, in wrapper 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) return await main 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) ^^^^^^^^^^ 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server_worker 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) async with build_async_engine_client( 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/contextlib.py", line 210, in aenter 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) return await anext(self.gen) 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) ^^^^^^^^^^^^^^^^^^^^^ 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) async with build_async_engine_client_from_engine_args( 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/contextlib.py", line 210, in aenter 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) return await anext(self.gen) 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) ^^^^^^^^^^^^^^^^^^^^^ 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 124, in build_async_engine_client_from_engine_args 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) vllm_config = engine_args.create_engine_config(usage_context=usage_context) 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1554, in create_engine_config 2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) self._check_feature_supported() 2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 2004, in _check_feature_supported 2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) _raise_unsupported_error(feature_name="Concurrent Partial Prefill") 2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 2283, in _raise_unsupported_error 2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) raise NotImplementedError(msg) 2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) NotImplementedError: Concurrent Partial Prefill is not supported. We recommend to remove Concurrent Partial Prefill from your config.

Fix Action

Fix / Workaround

2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299] 2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299] █ █ █▄ ▄█ 2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.19.0 2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299] █▄█▀ █ █ █ █ model /tmp/model-poc 2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀ 2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299] 2026-04-13 21:12:29,531 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:233] non-default args: {'model_tag': '/tmp/model-load-testing-poc/model-poc', 'host': '127.0.0.1', 'model': '/tmp/model-poc', 'dtype': 'float16', 'max_model_len': 1000, 'max_num_batched_tokens': 4096, 'max_num_seqs': 128, 'max_num_partial_prefills': 4, 'long_prefill_token_threshold': 256, 'performance_mode': 'interactivity'} 2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [model.py:549] Resolved architecture: Qwen3ForCausalLM 2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) WARNING 04-13 21:12:29 [model.py:2016] Casting torch.bfloat16 to torch.float16. 2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [model.py:1678] Using max model len 1000 2026-04-13 21:12:29,541 INFO [server] (APIServer pid=6907) Traceback (most recent call last): 2026-04-13 21:12:29,541 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/bin/vllm", line 6, in <module> 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) sys.exit(main()) 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) ^^^^^^ 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 75, in main 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) args.dispatch_function(args) 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) uvloop.run(run_server(args)) 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/uvloop/init.py", line 92, in run 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) return runner.run(wrapper()) 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) ^^^^^^^^^^^^^^^^^^^^^ 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/asyncio/runners.py", line 118, in run 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) return self._loop.run_until_complete(task) 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/uvloop/init.py", line 48, in wrapper 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) return await main 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) ^^^^^^^^^^ 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server_worker 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) async with build_async_engine_client( 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/contextlib.py", line 210, in aenter 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) return await anext(self.gen) 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) ^^^^^^^^^^^^^^^^^^^^^ 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) async with build_async_engine_client_from_engine_args( 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/contextlib.py", line 210, in aenter 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) return await anext(self.gen) 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) ^^^^^^^^^^^^^^^^^^^^^ 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 124, in build_async_engine_client_from_engine_args 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) vllm_config = engine_args.create_engine_config(usage_context=usage_context) 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1554, in create_engine_config 2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) self._check_feature_supported() 2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 2004, in _check_feature_supported 2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) _raise_unsupported_error(feature_name="Concurrent Partial Prefill") 2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 2283, in _raise_unsupported_error 2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) raise NotImplementedError(msg) 2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) NotImplementedError: Concurrent Partial Prefill is not supported. We recommend to remove Concurrent Partial Prefill from your config.

Code Example

vllm serve /tmp/model-poc --host 127.0.0.1 --port 8000 --gpu-memory-utilization 0.90 --max-model-len 1000 --dtype float16 --kv-cache-dtype auto --tensor-parallel-size 1 --pipeline-parallel-size 1 --max-num-batched-tokens 4096 --max-num-seqs 128 --max-num-partial-prefills 4 --max-long-partial-prefills 1 --long-prefill-token-threshold 256 --performance-mode interactivity

2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]
2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]        █     █     █▄   ▄█
2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.19.0
2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]   █▄█▀ █     █     █     █  model   /tmp/model-poc
2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]
2026-04-13 21:12:29,531 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:233] non-default args: {'model_tag': '/tmp/model-load-testing-poc/model-poc', 'host': '127.0.0.1', 'model': '/tmp/model-poc', 'dtype': 'float16', 'max_model_len': 1000, 'max_num_batched_tokens': 4096, 'max_num_seqs': 128, 'max_num_partial_prefills': 4, 'long_prefill_token_threshold': 256, 'performance_mode': 'interactivity'}
2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [model.py:549] Resolved architecture: Qwen3ForCausalLM
2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) WARNING 04-13 21:12:29 [model.py:2016] Casting torch.bfloat16 to torch.float16.
2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [model.py:1678] Using max model len 1000
2026-04-13 21:12:29,541 INFO [server] (APIServer pid=6907) Traceback (most recent call last):
2026-04-13 21:12:29,541 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/bin/vllm", line 6, in <module>
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     sys.exit(main())
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)              ^^^^^^
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 75, in main
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     args.dispatch_function(args)
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     uvloop.run(run_server(args))
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/uvloop/__init__.py", line 92, in run
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     return runner.run(wrapper())
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)            ^^^^^^^^^^^^^^^^^^^^^
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/asyncio/runners.py", line 118, in run
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     return self._loop.run_until_complete(task)
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/uvloop/__init__.py", line 48, in wrapper
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     return await main
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)            ^^^^^^^^^^
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server_worker
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     async with build_async_engine_client(
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/contextlib.py", line 210, in __aenter__
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     return await anext(self.gen)
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)            ^^^^^^^^^^^^^^^^^^^^^
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     async with build_async_engine_client_from_engine_args(
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/contextlib.py", line 210, in __aenter__
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     return await anext(self.gen)
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)            ^^^^^^^^^^^^^^^^^^^^^
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 124, in build_async_engine_client_from_engine_args
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     vllm_config = engine_args.create_engine_config(usage_context=usage_context)
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1554, in create_engine_config
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907)     self._check_feature_supported()
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 2004, in _check_feature_supported
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907)     _raise_unsupported_error(feature_name="Concurrent Partial Prefill")
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 2283, in _raise_unsupported_error
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907)     raise NotImplementedError(msg)
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) NotImplementedError: Concurrent Partial Prefill is not supported. We recommend to remove Concurrent Partial Prefill from your config.
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary> Output logs</summary>
vllm serve /tmp/model-poc --host 127.0.0.1 --port 8000 --gpu-memory-utilization 0.90 --max-model-len 1000 --dtype float16 --kv-cache-dtype auto --tensor-parallel-size 1 --pipeline-parallel-size 1 --max-num-batched-tokens 4096 --max-num-seqs 128 --max-num-partial-prefills 4 --max-long-partial-prefills 1 --long-prefill-token-threshold 256 --performance-mode interactivity

2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]
2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]        █     █     █▄   ▄█
2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.19.0
2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]   █▄█▀ █     █     █     █  model   /tmp/model-poc
2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]
2026-04-13 21:12:29,531 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:233] non-default args: {'model_tag': '/tmp/model-load-testing-poc/model-poc', 'host': '127.0.0.1', 'model': '/tmp/model-poc', 'dtype': 'float16', 'max_model_len': 1000, 'max_num_batched_tokens': 4096, 'max_num_seqs': 128, 'max_num_partial_prefills': 4, 'long_prefill_token_threshold': 256, 'performance_mode': 'interactivity'}
2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [model.py:549] Resolved architecture: Qwen3ForCausalLM
2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) WARNING 04-13 21:12:29 [model.py:2016] Casting torch.bfloat16 to torch.float16.
2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [model.py:1678] Using max model len 1000
2026-04-13 21:12:29,541 INFO [server] (APIServer pid=6907) Traceback (most recent call last):
2026-04-13 21:12:29,541 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/bin/vllm", line 6, in <module>
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     sys.exit(main())
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)              ^^^^^^
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 75, in main
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     args.dispatch_function(args)
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     uvloop.run(run_server(args))
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/uvloop/__init__.py", line 92, in run
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     return runner.run(wrapper())
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)            ^^^^^^^^^^^^^^^^^^^^^
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/asyncio/runners.py", line 118, in run
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     return self._loop.run_until_complete(task)
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/uvloop/__init__.py", line 48, in wrapper
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     return await main
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)            ^^^^^^^^^^
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server_worker
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     async with build_async_engine_client(
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/contextlib.py", line 210, in __aenter__
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     return await anext(self.gen)
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)            ^^^^^^^^^^^^^^^^^^^^^
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     async with build_async_engine_client_from_engine_args(
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/contextlib.py", line 210, in __aenter__
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     return await anext(self.gen)
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)            ^^^^^^^^^^^^^^^^^^^^^
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 124, in build_async_engine_client_from_engine_args
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     vllm_config = engine_args.create_engine_config(usage_context=usage_context)
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1554, in create_engine_config
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907)     self._check_feature_supported()
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 2004, in _check_feature_supported
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907)     _raise_unsupported_error(feature_name="Concurrent Partial Prefill")
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 2283, in _raise_unsupported_error
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907)     raise NotImplementedError(msg)
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) NotImplementedError: Concurrent Partial Prefill is not supported. We recommend to remove Concurrent Partial Prefill from your config.
</details>

🐛 Describe the bug

V1 engine fails when trying to enable max-num-partial-prefills parameter

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The issue is likely due to the max-num-partial-prefills parameter being enabled, which is not supported in the current version, and removing it from the config should resolve the issue.

Guidance

  • The error message NotImplementedError: Concurrent Partial Prefill is not supported suggests that the max-num-partial-prefills parameter is not supported in the current version.
  • To fix the issue, try removing the --max-num-partial-prefills parameter from the command line arguments.
  • Verify that the max-num-partial-prefills parameter is not set in any configuration files or environment variables.
  • If the issue persists, check the documentation for any updates on supported parameters and configurations.

Example

No code snippet is provided as the issue is related to command line arguments and configuration.

Notes

The max-num-partial-prefills parameter is not supported in the current version, and removing it should resolve the issue. However, this may affect the performance or behavior of the application.

Recommendation

Apply workaround: Remove the --max-num-partial-prefills parameter from the command line arguments to resolve the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING