vllm - 💡(How to fix) Fix [Bug]: max-num-partial-prefills failes on V1 engine start [1 participants]

vllm2026-04-13 21:21:00

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#39737•Fetched 2026-04-15 06:20:40

View on GitHub

Comments

Participants

Timeline

Reactions

Author

DmitriiShubin

Participants

DmitriiShubin

Timeline (top)

labeled ×1

Error Message

vllm serve /tmp/model-poc --host 127.0.0.1 --port 8000 --gpu-memory-utilization 0.90 --max-model-len 1000 --dtype float16 --kv-cache-dtype auto --tensor-parallel-size 1 --pipeline-parallel-size 1 --max-num-batched-tokens 4096 --max-num-seqs 128 --max-num-partial-prefills 4 --max-long-partial-prefills 1 --long-prefill-token-threshold 256 --performance-mode interactivity

2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299] 2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299] █ █ █▄ ▄█ 2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.19.0 2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299] █▄█▀ █ █ █ █ model /tmp/model-poc 2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀ 2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299] 2026-04-13 21:12:29,531 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:233] non-default args: {'model_tag': '/tmp/model-load-testing-poc/model-poc', 'host': '127.0.0.1', 'model': '/tmp/model-poc', 'dtype': 'float16', 'max_model_len': 1000, 'max_num_batched_tokens': 4096, 'max_num_seqs': 128, 'max_num_partial_prefills': 4, 'long_prefill_token_threshold': 256, 'performance_mode': 'interactivity'} 2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [model.py:549] Resolved architecture: Qwen3ForCausalLM 2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) WARNING 04-13 21:12:29 [model.py:2016] Casting torch.bfloat16 to torch.float16. 2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [model.py:1678] Using max model len 1000 2026-04-13 21:12:29,541 INFO [server] (APIServer pid=6907) Traceback (most recent call last): 2026-04-13 21:12:29,541 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/bin/vllm", line 6, in <module> 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) sys.exit(main()) 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) ^^^^^^ 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 75, in main 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) args.dispatch_function(args) 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) uvloop.run(run_server(args)) 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/uvloop/init.py", line 92, in run 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) return runner.run(wrapper()) 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) ^^^^^^^^^^^^^^^^^^^^^ 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/asyncio/runners.py", line 118, in run 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) return self._loop.run_until_complete(task) 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/uvloop/init.py", line 48, in wrapper 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) return await main 2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907) ^^^^^^^^^^ 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server_worker 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) async with build_async_engine_client( 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/contextlib.py", line 210, in aenter 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) return await anext(self.gen) 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) ^^^^^^^^^^^^^^^^^^^^^ 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) async with build_async_engine_client_from_engine_args( 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/contextlib.py", line 210, in aenter 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) return await anext(self.gen) 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) ^^^^^^^^^^^^^^^^^^^^^ 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 124, in build_async_engine_client_from_engine_args 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) vllm_config = engine_args.create_engine_config(usage_context=usage_context) 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1554, in create_engine_config 2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) self._check_feature_supported() 2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 2004, in _check_feature_supported 2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) _raise_unsupported_error(feature_name="Concurrent Partial Prefill") 2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 2283, in _raise_unsupported_error 2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) raise NotImplementedError(msg) 2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) NotImplementedError: Concurrent Partial Prefill is not supported. We recommend to remove Concurrent Partial Prefill from your config.

Fix Action

Fix / Workaround

Code Example

vllm serve /tmp/model-poc --host 127.0.0.1 --port 8000 --gpu-memory-utilization 0.90 --max-model-len 1000 --dtype float16 --kv-cache-dtype auto --tensor-parallel-size 1 --pipeline-parallel-size 1 --max-num-batched-tokens 4096 --max-num-seqs 128 --max-num-partial-prefills 4 --max-long-partial-prefills 1 --long-prefill-token-threshold 256 --performance-mode interactivity

2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]
2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]        █     █     █▄   ▄█
2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.19.0
2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]   █▄█▀ █     █     █     █  model   /tmp/model-poc
2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]
2026-04-13 21:12:29,531 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:233] non-default args: {'model_tag': '/tmp/model-load-testing-poc/model-poc', 'host': '127.0.0.1', 'model': '/tmp/model-poc', 'dtype': 'float16', 'max_model_len': 1000, 'max_num_batched_tokens': 4096, 'max_num_seqs': 128, 'max_num_partial_prefills': 4, 'long_prefill_token_threshold': 256, 'performance_mode': 'interactivity'}
2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [model.py:549] Resolved architecture: Qwen3ForCausalLM
2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) WARNING 04-13 21:12:29 [model.py:2016] Casting torch.bfloat16 to torch.float16.
2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [model.py:1678] Using max model len 1000
2026-04-13 21:12:29,541 INFO [server] (APIServer pid=6907) Traceback (most recent call last):
2026-04-13 21:12:29,541 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/bin/vllm", line 6, in <module>
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     sys.exit(main())
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)              ^^^^^^
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 75, in main
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     args.dispatch_function(args)
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     uvloop.run(run_server(args))
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/uvloop/__init__.py", line 92, in run
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     return runner.run(wrapper())
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)            ^^^^^^^^^^^^^^^^^^^^^
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/asyncio/runners.py", line 118, in run
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     return self._loop.run_until_complete(task)
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/uvloop/__init__.py", line 48, in wrapper
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     return await main
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)            ^^^^^^^^^^
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server_worker
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     async with build_async_engine_client(
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/contextlib.py", line 210, in __aenter__
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     return await anext(self.gen)
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)            ^^^^^^^^^^^^^^^^^^^^^
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     async with build_async_engine_client_from_engine_args(
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/contextlib.py", line 210, in __aenter__
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     return await anext(self.gen)
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)            ^^^^^^^^^^^^^^^^^^^^^
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 124, in build_async_engine_client_from_engine_args
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     vllm_config = engine_args.create_engine_config(usage_context=usage_context)
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1554, in create_engine_config
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907)     self._check_feature_supported()
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 2004, in _check_feature_supported
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907)     _raise_unsupported_error(feature_name="Concurrent Partial Prefill")
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 2283, in _raise_unsupported_error
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907)     raise NotImplementedError(msg)
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) NotImplementedError: Concurrent Partial Prefill is not supported. We recommend to remove Concurrent Partial Prefill from your config.

RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary> Output logs</summary>

vllm serve /tmp/model-poc --host 127.0.0.1 --port 8000 --gpu-memory-utilization 0.90 --max-model-len 1000 --dtype float16 --kv-cache-dtype auto --tensor-parallel-size 1 --pipeline-parallel-size 1 --max-num-batched-tokens 4096 --max-num-seqs 128 --max-num-partial-prefills 4 --max-long-partial-prefills 1 --long-prefill-token-threshold 256 --performance-mode interactivity

2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]
2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]        █     █     █▄   ▄█
2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.19.0
2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]   █▄█▀ █     █     █     █  model   /tmp/model-poc
2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
2026-04-13 21:12:29,527 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:299]
2026-04-13 21:12:29,531 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [utils.py:233] non-default args: {'model_tag': '/tmp/model-load-testing-poc/model-poc', 'host': '127.0.0.1', 'model': '/tmp/model-poc', 'dtype': 'float16', 'max_model_len': 1000, 'max_num_batched_tokens': 4096, 'max_num_seqs': 128, 'max_num_partial_prefills': 4, 'long_prefill_token_threshold': 256, 'performance_mode': 'interactivity'}
2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [model.py:549] Resolved architecture: Qwen3ForCausalLM
2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) WARNING 04-13 21:12:29 [model.py:2016] Casting torch.bfloat16 to torch.float16.
2026-04-13 21:12:29,540 INFO [server] (APIServer pid=6907) INFO 04-13 21:12:29 [model.py:1678] Using max model len 1000
2026-04-13 21:12:29,541 INFO [server] (APIServer pid=6907) Traceback (most recent call last):
2026-04-13 21:12:29,541 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/bin/vllm", line 6, in <module>
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     sys.exit(main())
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)              ^^^^^^
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 75, in main
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     args.dispatch_function(args)
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     uvloop.run(run_server(args))
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/uvloop/__init__.py", line 92, in run
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     return runner.run(wrapper())
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)            ^^^^^^^^^^^^^^^^^^^^^
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/asyncio/runners.py", line 118, in run
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     return self._loop.run_until_complete(task)
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/uvloop/__init__.py", line 48, in wrapper
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)     return await main
2026-04-13 21:12:29,542 INFO [server] (APIServer pid=6907)            ^^^^^^^^^^
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server_worker
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     async with build_async_engine_client(
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/contextlib.py", line 210, in __aenter__
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     return await anext(self.gen)
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)            ^^^^^^^^^^^^^^^^^^^^^
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     async with build_async_engine_client_from_engine_args(
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/contextlib.py", line 210, in __aenter__
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     return await anext(self.gen)
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)            ^^^^^^^^^^^^^^^^^^^^^
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 124, in build_async_engine_client_from_engine_args
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)     vllm_config = engine_args.create_engine_config(usage_context=usage_context)
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-04-13 21:12:29,543 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1554, in create_engine_config
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907)     self._check_feature_supported()
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 2004, in _check_feature_supported
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907)     _raise_unsupported_error(feature_name="Concurrent Partial Prefill")
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907)   File "/root/miniconda3/envs/test/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 2283, in _raise_unsupported_error
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907)     raise NotImplementedError(msg)
2026-04-13 21:12:29,544 INFO [server] (APIServer pid=6907) NotImplementedError: Concurrent Partial Prefill is not supported. We recommend to remove Concurrent Partial Prefill from your config.

</details>

🐛 Describe the bug

V1 engine fails when trying to enable max-num-partial-prefills parameter

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The issue is likely due to the max-num-partial-prefills parameter being enabled, which is not supported in the current version, and removing it from the config should resolve the issue.

Guidance

The error message NotImplementedError: Concurrent Partial Prefill is not supported suggests that the max-num-partial-prefills parameter is not supported in the current version.
To fix the issue, try removing the --max-num-partial-prefills parameter from the command line arguments.
Verify that the max-num-partial-prefills parameter is not set in any configuration files or environment variables.
If the issue persists, check the documentation for any updates on supported parameters and configurations.

Example

No code snippet is provided as the issue is related to command line arguments and configuration.

Notes

The max-num-partial-prefills parameter is not supported in the current version, and removing it should resolve the issue. However, this may affect the performance or behavior of the application.

Recommendation

Apply workaround: Remove the --max-num-partial-prefills parameter from the command line arguments to resolve the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #retrieval issue #search optimization #API routing #API middleware

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: max-num-partial-prefills failes on V1 engine start [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Code Example

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]: max-num-partial-prefills failes on V1 engine start [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Code Example

Your current environment

🐛 Describe the bug

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING