vllm - 💡(How to fix) Fix [Feature]: Allow passing `images` to CompletionRequest [3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#37423Fetched 2026-04-08 00:57:33
View on GitHub
Comments
3
Participants
2
Timeline
5
Reactions
0
Timeline (top)
commented ×3assigned ×1labeled ×1

Root Cause

At the moment it is not possible to evaluate pretrained multi-modal models such as: https://huggingface.co/Qwen/Qwen3.5-9B-Base on tasks that include text as well as images because it is not possible to pass images to the CompletionRequest class:

Code Example

class CompletionRequest(OpenAIBaseModel):
    # Ordered by official OpenAI API documentation
    # https://platform.openai.com/docs/api-reference/completions/create
    model: str | None = None
    prompt: (
        list[Annotated[int, Field(ge=0)]]
        | list[list[Annotated[int, Field(ge=0)]]]
        | str
        | list[str]
        | None
    ) = None

---

class _NDArrayPydanticAnnotation:
    @classmethod
    def __get_pydantic_core_schema__(
        cls,
        _source_type: Any,
        _handler: GetCoreSchemaHandler,
    ) -> core_schema.CoreSchema:
        from_serialized_schema = core_schema.no_info_plain_validator_function(
            _deserialize_ndarray
        )

        return core_schema.json_or_python_schema(
            json_schema=from_serialized_schema,
            python_schema=from_serialized_schema,
            serialization=core_schema.plain_serializer_function_ser_schema(
                _serialize_ndarray
            ),
        )

SerializableNDArray = Annotated[np.ndarray, _NDArrayPydanticAnnotation]

class CompletionRequest(OpenAIBaseModel):
    # Ordered by official OpenAI API documentation
    # https://platform.openai.com/docs/api-reference/completions/create
    model: str | None = None
    images: list[SerializableNDArray] = Field(default_factory=list)
    prompt: (
        list[Annotated[int, Field(ge=0)]]
        | list[list[Annotated[int, Field(ge=0)]]]
        | str
        | list[str]
        | None
    ) = None
RAW_BUFFERClick to expand / collapse

🚀 The feature, motivation and pitch

At the moment it is not possible to evaluate pretrained multi-modal models such as: https://huggingface.co/Qwen/Qwen3.5-9B-Base on tasks that include text as well as images because it is not possible to pass images to the CompletionRequest class:

class CompletionRequest(OpenAIBaseModel):
    # Ordered by official OpenAI API documentation
    # https://platform.openai.com/docs/api-reference/completions/create
    model: str | None = None
    prompt: (
        list[Annotated[int, Field(ge=0)]]
        | list[list[Annotated[int, Field(ge=0)]]]
        | str
        | list[str]
        | None
    ) = None

See here: https://github.com/vllm-project/vllm/blob/17c47fb8691f2efd7948659952c44ef167462534/vllm/entrypoints/openai/completion/protocol.py#L42-L52

This significantly limits the usage of pretrained models in vLLM.

Proposal:

Let's add: images: np.ndarray | None = None to the CompletionRequest and in case this object is not None we validate that prompt has to be list[int] meaning the prompt already has to be pre-processed by the tokenizer. This way we could add this feature with minimal changes -> no need for extra pre-processing, we can just pass the images directly down into the model definitions.

class _NDArrayPydanticAnnotation:
    @classmethod
    def __get_pydantic_core_schema__(
        cls,
        _source_type: Any,
        _handler: GetCoreSchemaHandler,
    ) -> core_schema.CoreSchema:
        from_serialized_schema = core_schema.no_info_plain_validator_function(
            _deserialize_ndarray
        )

        return core_schema.json_or_python_schema(
            json_schema=from_serialized_schema,
            python_schema=from_serialized_schema,
            serialization=core_schema.plain_serializer_function_ser_schema(
                _serialize_ndarray
            ),
        )

SerializableNDArray = Annotated[np.ndarray, _NDArrayPydanticAnnotation]

class CompletionRequest(OpenAIBaseModel):
    # Ordered by official OpenAI API documentation
    # https://platform.openai.com/docs/api-reference/completions/create
    model: str | None = None
    images: list[SerializableNDArray] = Field(default_factory=list)
    prompt: (
        list[Annotated[int, Field(ge=0)]]
        | list[list[Annotated[int, Field(ge=0)]]]
        | str
        | list[str]
        | None
    ) = None

We assume/enforce that when images are passed both images and prompt are already fully pre-processed and then we can just forward to generate.

Alternatives

We could also think about pre-processing images and prompt in serving.py but maybe this could also just be done in a follow-up PR.

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To add support for passing images to the CompletionRequest class, we need to make the following changes:

  • Add an images field to the CompletionRequest class that accepts a list of SerializableNDArray objects.
  • Update the validation logic to ensure that when images is not None, the prompt field must be a list of integers (i.e., pre-processed by the tokenizer).

Here's the updated code:

class _NDArrayPydanticAnnotation:
    @classmethod
    def __get_pydantic_core_schema__(
        cls,
        _source_type: Any,
        _handler: GetCoreSchemaHandler,
    ) -> core_schema.CoreSchema:
        from_serialized_schema = core_schema.no_info_plain_validator_function(
            _deserialize_ndarray
        )

        return core_schema.json_or_python_schema(
            json_schema=from_serialized_schema,
            python_schema=from_serialized_schema,
            serialization=core_schema.plain_serializer_function_ser_schema(
                _serialize_ndarray
            ),
        )

SerializableNDArray = Annotated[np.ndarray, _NDArrayPydanticAnnotation]

class CompletionRequest(OpenAIBaseModel):
    # Ordered by official OpenAI API documentation
    # https://platform.openai.com/docs/api-reference/completions/create
    model: str | None = None
    images: list[SerializableNDArray] = Field(default_factory=list)
    prompt: (
        list[Annotated[int, Field(ge=0)]]
        | list[list[Annotated[int, Field(ge=0)]]]
        | str
        | list[str]
        | None
    ) = None

    @root_validator
    def validate_images_and_prompt(cls, values):
        images, prompt = values.get("images"), values.get("prompt")
        if images and not isinstance(prompt, list) or not all(isinstance(x, int) for x in prompt):
            raise ValueError("When images are provided, prompt must be a list of integers")
        return values

Verification

To verify that the fix worked, you can create a CompletionRequest object with an images field and a prompt field that is a list of integers, and then check that the object is valid:

request = CompletionRequest(
    model="Qwen3.5-9B-Base",
    images=[np.array([1, 2, 3])],
    prompt=[1, 2, 3]
)
try:
    request.validate()
    print("Request is valid")
except ValueError as e:
    print(f"Request is invalid: {e}")

Extra Tips

  • Make sure to update the documentation to reflect the new images field and the updated validation logic.
  • Consider adding additional validation or error handling to ensure that the images field is properly formatted and can

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Feature]: Allow passing `images` to CompletionRequest [3 comments, 2 participants]