vllm - 💡(How to fix) Fix [Bug]: vllm-openai nightly Docker image still fails due to missing pytest during EngineCore startup [1 pull requests]

Q: Expected behavior

`vllm/vllm-openai:nightly` should start successfully without requiring the test dependency `pytest`. If some runtime dependency indirectly imports `cupy.testing`, the Docker image should either include the required dependency or avoid importing test-only modules during normal server startup.

vllm2026-05-24 14:00:39

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

EngineCore startup -> _initialize_kv_caches -> determine_available_memory -> gpu_worker.profile_run -> gpu_model_runner._dummy_run -> torch._dynamo AOT compile -> torch.distributed.tensor.experimental._context_parallel._cp_custom_ops -> torch.library.custom_op / _register_fake -> inspect.getframeinfo / inspect.getmodule -> cupy.testing -> import pytest -> ModuleNotFoundError: No module named 'pytest'

Root Cause

I previously reported a similar startup failure in #43480, where the nightly Docker image failed because pytest was not installed and was imported indirectly via humming / cupy.testing.

Fix Action

Fixed

Fixed by PR: Fix CuPy runtime deps and restore humming (https://github.com/vllm-project/vllm/pull/43530)

Code Example

EngineCore startup
  -> _initialize_kv_caches
  -> determine_available_memory
  -> gpu_worker.profile_run
  -> gpu_model_runner._dummy_run
  -> torch._dynamo AOT compile
  -> torch.distributed.tensor.experimental._context_parallel._cp_custom_ops
  -> torch.library.custom_op / _register_fake
  -> inspect.getframeinfo / inspect.getmodule
  -> cupy.testing
  -> import pytest
  -> ModuleNotFoundError: No module named 'pytest'

---

{
  'model_tag': 'RedHatAI/gemma-4-31B-it-NVFP4',
  'default_chat_template_kwargs': {'enable_thinking': True},
  'enable_auto_tool_choice': True,
  'tool_call_parser': 'gemma4',
  'host': '0.0.0.0',
  'model': 'RedHatAI/gemma-4-31B-it-NVFP4',
  'trust_remote_code': True,
  'max_model_len': 256000,
  'served_model_name': ['gemma4-31b'],
  'reasoning_parser': 'gemma4',
  'kv_cache_dtype': 'fp8',
  'mm_processor_kwargs': {'max_soft_tokens': 1120},
  'max_num_batched_tokens': 8192,
  'max_num_seqs': 32,
  'scheduler_reserve_full_isl': False,
  'async_scheduling': True,
  'optimization_level': '3'
}

---

File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 396, in determine_available_memory
    self.model_runner.profile_run()

File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 6164, in profile_run
    hidden_states, last_hidden_states = self._dummy_run(

File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5824, in _dummy_run
    outputs = self.model(

File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gemma4_mm.py", line 1487, in forward
    hidden_states = self.language_model.model(

File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 663, in __call__
    self.aot_compiled_fn = self.aot_compile(*args, **kwargs)

File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 873, in aot_compile
    return aot_compile_fullgraph(

File "/usr/local/lib/python3.12/dist-packages/torch/distributed/tensor/experimental/_context_parallel/_cp_custom_ops.py", line 8, in <module>
    @torch.library.custom_op("cplib::flex_cp_allgather", mutates_args=())

File "/usr/local/lib/python3.12/dist-packages/torch/_library/utils.py", line 45, in get_source
    frame = inspect.getframeinfo(sys._getframe(stacklevel))

File "/usr/lib/python3.12/inspect.py", line 1007, in getmodule
    if ismodule(module) and hasattr(module, '__file__'):

File "/usr/local/lib/python3.12/dist-packages/cupy/testing/__init__.py", line 50, in <module>
    from cupy.testing._random import fix_random  # NOQA

File "/usr/local/lib/python3.12/dist-packages/cupy/testing/_random.py", line 11, in <module>
    import pytest

ModuleNotFoundError: No module named 'pytest'

---

RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

---

ModuleNotFoundError: No module named 'pytest'

RAW_BUFFERClick to expand / collapse

Your current environment

Docker image: vllm/vllm-openai:nightly https://hub.docker.com/layers/vllm/vllm-openai/nightly/images/sha256-2b5f940431016b25c461761cb813cebd1f02a9e4ba1069226a5c1c9ffb6834c6

vLLM version: 0.21.1rc1.dev262+g33d7cbe02

Model: RedHatAI/gemma-4-31B-it-NVFP4

Related issue: #43480

🐛 Describe the bug

I previously reported a similar startup failure in #43480, where the nightly Docker image failed because pytest was not installed and was imported indirectly via humming / cupy.testing.

After pulling a newer nightly image, the original failure path seems to have changed, but the server still fails to start because pytest is missing.

In this newer build, the model is loaded successfully, but EngineCore fails during startup while vLLM is initializing KV caches and running the profiling dummy run.

The failure path is now roughly:

EngineCore startup
  -> _initialize_kv_caches
  -> determine_available_memory
  -> gpu_worker.profile_run
  -> gpu_model_runner._dummy_run
  -> torch._dynamo AOT compile
  -> torch.distributed.tensor.experimental._context_parallel._cp_custom_ops
  -> torch.library.custom_op / _register_fake
  -> inspect.getframeinfo / inspect.getmodule
  -> cupy.testing
  -> import pytest
  -> ModuleNotFoundError: No module named 'pytest'

So this appears to be the same underlying runtime dependency / import side-effect issue as #43480, but it is now triggered from a different code path during EngineCore initialization rather than during the earlier quantization config verification path.

Since pytest is normally a test dependency, the official runtime Docker image should not require it for normal vLLM server startup.

Startup arguments

The server was started with the following non-default arguments shown in the log:

{
  'model_tag': 'RedHatAI/gemma-4-31B-it-NVFP4',
  'default_chat_template_kwargs': {'enable_thinking': True},
  'enable_auto_tool_choice': True,
  'tool_call_parser': 'gemma4',
  'host': '0.0.0.0',
  'model': 'RedHatAI/gemma-4-31B-it-NVFP4',
  'trust_remote_code': True,
  'max_model_len': 256000,
  'served_model_name': ['gemma4-31b'],
  'reasoning_parser': 'gemma4',
  'kv_cache_dtype': 'fp8',
  'mm_processor_kwargs': {'max_soft_tokens': 1120},
  'max_num_batched_tokens': 8192,
  'max_num_seqs': 32,
  'scheduler_reserve_full_isl': False,
  'async_scheduling': True,
  'optimization_level': '3'
}

Expected behavior

vllm/vllm-openai:nightly should start successfully without requiring the test dependency pytest.

If some runtime dependency indirectly imports cupy.testing, the Docker image should either include the required dependency or avoid importing test-only modules during normal server startup.

Actual behavior

The server fails during EngineCore initialization.

The important part of the traceback is:

File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 396, in determine_available_memory
    self.model_runner.profile_run()

File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 6164, in profile_run
    hidden_states, last_hidden_states = self._dummy_run(

File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 5824, in _dummy_run
    outputs = self.model(

File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gemma4_mm.py", line 1487, in forward
    hidden_states = self.language_model.model(

File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 663, in __call__
    self.aot_compiled_fn = self.aot_compile(*args, **kwargs)

File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 873, in aot_compile
    return aot_compile_fullgraph(

File "/usr/local/lib/python3.12/dist-packages/torch/distributed/tensor/experimental/_context_parallel/_cp_custom_ops.py", line 8, in <module>
    @torch.library.custom_op("cplib::flex_cp_allgather", mutates_args=())

File "/usr/local/lib/python3.12/dist-packages/torch/_library/utils.py", line 45, in get_source
    frame = inspect.getframeinfo(sys._getframe(stacklevel))

File "/usr/lib/python3.12/inspect.py", line 1007, in getmodule
    if ismodule(module) and hasattr(module, '__file__'):

File "/usr/local/lib/python3.12/dist-packages/cupy/testing/__init__.py", line 50, in <module>
    from cupy.testing._random import fix_random  # NOQA

File "/usr/local/lib/python3.12/dist-packages/cupy/testing/_random.py", line 11, in <module>
    import pytest

ModuleNotFoundError: No module named 'pytest'

Then the API server exits with:

RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Notes

This does not look like a model download or Hugging Face authentication issue. The model checkpoint is loaded successfully before the failure.

This also does not look specific to the earlier humming import path reported in #43480. The new failure path goes through torch._dynamo / torch.distributed.tensor.experimental / cupy.testing, but it reaches the same root cause:

ModuleNotFoundError: No module named 'pytest'

Could you please check whether the nightly runtime image should include pytest, or whether cupy.testing should be avoided during normal vLLM server startup?

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

vllm/vllm-openai:nightly should start successfully without requiring the test dependency pytest.

If some runtime dependency indirectly imports cupy.testing, the Docker image should either include the required dependency or avoid importing test-only modules during normal server startup.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: vllm-openai nightly Docker image still fails due to missing pytest during EngineCore startup [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

Code Example

Your current environment

🐛 Describe the bug

Startup arguments

Expected behavior

Actual behavior

Notes

Before submitting a new issue...

FAQ

Expected behavior

Still need to ship something?

TRENDING