vllm - ✅(Solved) Fix [Bug]: DeepSeek-OCR v1 crashes with TensorSchema mismatch when images_crop is empty (small images ≤640px) [1 pull requests, 1 participants]

ketyi · 2026-03-10T15:27:40Z

[vllm] PR 36670: Bugfix Model Fix DeepSeek-OCR TensorSchema crash on empty images crop - Repository: vllm-project/vllm - Author: ketyi - State: closed | merged… # PR #36670: [Bugfix][Model] Fix DeepSeek-OCR TensorSchema crash on empty images_crop - Repository: vllm-project/vllm - Author: ketyi - State: closed | merged: True - Link: https://github.com/vllm-project/vllm/pull/36670 ## Description (problem / solution / changelog) ## Summary Fixes a crash in `DeepseekOCRForCausalLM` (`deepseek-ai/DeepSeek-OCR`) when processing images that do not require cropping (≤ 640×640 pixels). Fixes #36669 ## Root Cause In `_parse_and_validate_image_input`, when `images_crop` is an empty tensor (no crops needed for small images), the code checked `images_crop.numel() > 0` and fell back to `image_size = base_size = 1024`. But the empty tensor's shape is `(0, 3, 640, 640)` — dimension 640 comes from the Gundam processor preset. `TensorSchema` validation then fails: expected 1024, got 640. This kills the V1 engine on the first small image, making all subsequent requests fail with `EngineDeadError`. ## Fix Remove the `numel() > 0` guard. `tensor.shape[-1]` works correctly on zero-element tensors and returns the actual dimension size (640). ```diff - if images_crop is not None and images_crop.numel() > 0: + if images_crop is not None: image_size = images_crop.shape[-1] ``` ## Testing - Added 4 regression tests in `tests/multimodal/test_deepseek_ocr_empty_crop_unit.py`: - Empty `images_crop` with Gundam preset (the crashing case) - Populated `images_crop` with Gundam preset (existing happy path) - Empty `images_crop` with Base preset (image_size == base_size) - Deliberately mismatched binding still raises `ValueError` - Verified with real PDFs containing mixed small/large page images ## Related - `deepseek_ocr2.py` avoids this by not binding `image_size` in `resolve_bindings` - `TODO(Isotr0py)` at `processors/deepseek_ocr.py:24` about exposing presets via `mm_kwargs` (broader fix, separate PR) cc @Isotr0py @DarkLight1337 ## Changed files - `tests/models/multimodal/processing/test_deepseek_ocr.py` (added, +134/-0) - `vllm/model_executor/models/deepseek_ocr.py` (modified, +1/-4) ## Fixed - Fixed by PR: [Bugfix][Model] Fix DeepSeek-OCR TensorSchema crash on empty images_crop (https://github.com/vllm-project/vllm/pull/36670) ### Your current environment - **vLLM version:** 0.17.0 (also current `main`) - **GPU:** NVIDIA A100 - **Python:** 3.12 - **CUDA:** 13.1 ### Model `deepseek-ai/DeepSeek-OCR` (`DeepseekOCRForCausalLM`) ### 🐛 Describe the bug `DeepseekOCRForCausalLM` crashes with a fatal `EngineDeadError` when processing images that do **not** require cropping (images ≤ 640×640 pixels). The V1 engine dies on the first such request and all subsequent requests fail. **Error:** ``` ValueError: images_crop dim[2] expected 1024, got 640. Expected shape: ('bnp', 3, 1024, 1024), but got torch.Size([0, 3, 640, 640]) ``` **Root cause:** In `_parse_and_validate_image_input` ([deepseek_ocr.py#L455](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/deepseek_ocr.py#L455)), when `images_crop.numel() == 0` (no crops needed for small images), the code sets `image_size = base_size = 1024`. But the empty tensor's shape is still `(0, 3, 640, 640)` — the `image_size` dimension carries 640 from the Gundam processor preset. `TensorSchema.validate()` then sees the mismatch: expected 1024, got 640. **The fix is trivial** — remove the `numel() > 0` guard since `shape[-1]` is valid on zero-element tensors: ```python # Before (buggy): if images_crop is not None and images_crop.numel() > 0: image_size = images_crop.shape[-1] else: image_size = base_size # After (fixed): if images_crop is not None: image_size = images_crop.shape[-1] else: image_size = base_size ``` **Note:** `deepseek_ocr2.py` avoids this entirely by not binding `image_size` in `resolve_bindings` (only binds `base_size`). ### Why only some images crash - **Large images** (> 640×640): crops are created → `numel() > 0` → `image_size = 640` → schema validates OK ✓ - **Small images** (≤ 640×640): no crops → `numel() == 0` → `image_size = 1024` → mismatch with `(0, 3, 640, 640)` → **crash** ✗ Once the V1 engine encounters a **single** crashing request, `EngineCore` raises `EngineDeadError` and **all** subsequent requests fail. ### How to reproduce ```python from vllm import LLM, SamplingParams from PIL import Image small_image = Image.new("RGB", (400, 300), color="white") llm = LLM( model="deepseek-ai/DeepSeek-OCR", hf_overrides={"architectures": ["DeepseekOCRForCausalLM"]}, dtype="bfloat16", max_model_len=4096, ) # This will crash with EngineDeadError output = llm.generate( [{"prompt": " \nDescribe this image.", "multi_modal_data": {"image": small_image}}], SamplingParams(temperature=0.0, max_tokens=100), ) ``` ### Before submitting a new issue... - [x] I have searched for similar issues - [x] I have verified the bug on the latest vLLM main branch

vllm2026-03-10 15:27:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#36669•Fetched 2026-04-08 00:35:28

View on GitHub

Comments

Participants

Timeline

Reactions

Author

ketyi

Participants

ketyi

Timeline (top)

project_v2_item_status_changed ×2added_to_project_v2 ×1closed ×1cross-referenced ×1

Error Message

ValueError: images_crop dim[2] expected 1024, got 640. Expected shape: ('bnp', 3, 1024, 1024), but got torch.Size([0, 3, 640, 640])

Root Cause

Root cause: In _parse_and_validate_image_input (deepseek_ocr.py#L455), when images_crop.numel() == 0 (no crops needed for small images), the code sets image_size = base_size = 1024. But the empty tensor's shape is still (0, 3, 640, 640) — the image_size dimension carries 640 from the Gundam processor preset. TensorSchema.validate() then sees the mismatch: expected 1024, got 640.

Fix Action

Fixed

Fixed by PR: [Bugfix][Model] Fix DeepSeek-OCR TensorSchema crash on empty images_crop (https://github.com/vllm-project/vllm/pull/36670)

PR fix notes

PR #36670: [Bugfix][Model] Fix DeepSeek-OCR TensorSchema crash on empty images_crop

Repository: vllm-project/vllm
Author: ketyi
State: closed | merged: True
Link: https://github.com/vllm-project/vllm/pull/36670

Description (problem / solution / changelog)

Summary

Fixes a crash in DeepseekOCRForCausalLM (deepseek-ai/DeepSeek-OCR) when processing images that do not require cropping (≤ 640×640 pixels).

Fixes #36669

Root Cause

In _parse_and_validate_image_input, when images_crop is an empty tensor (no crops needed for small images), the code checked images_crop.numel() > 0 and fell back to image_size = base_size = 1024. But the empty tensor's shape is (0, 3, 640, 640) — dimension 640 comes from the Gundam processor preset. TensorSchema validation then fails: expected 1024, got 640.

This kills the V1 engine on the first small image, making all subsequent requests fail with EngineDeadError.

Fix

Remove the numel() > 0 guard. tensor.shape[-1] works correctly on zero-element tensors and returns the actual dimension size (640).

-        if images_crop is not None and images_crop.numel() > 0:
+        if images_crop is not None:
             image_size = images_crop.shape[-1]

Testing

Added 4 regression tests in tests/multimodal/test_deepseek_ocr_empty_crop_unit.py:
- Empty images_crop with Gundam preset (the crashing case)
- Populated images_crop with Gundam preset (existing happy path)
- Empty images_crop with Base preset (image_size == base_size)
- Deliberately mismatched binding still raises ValueError
Verified with real PDFs containing mixed small/large page images

deepseek_ocr2.py avoids this by not binding image_size in resolve_bindings
TODO(Isotr0py) at processors/deepseek_ocr.py:24 about exposing presets via mm_kwargs (broader fix, separate PR)

cc @Isotr0py @DarkLight1337

Changed files

tests/models/multimodal/processing/test_deepseek_ocr.py (added, +134/-0)
vllm/model_executor/models/deepseek_ocr.py (modified, +1/-4)

Code Example

ValueError: images_crop dim[2] expected 1024, got 640.
Expected shape: ('bnp', 3, 1024, 1024), but got torch.Size([0, 3, 640, 640])

---

# Before (buggy):
if images_crop is not None and images_crop.numel() > 0:
    image_size = images_crop.shape[-1]
else:
    image_size = base_size

# After (fixed):
if images_crop is not None:
    image_size = images_crop.shape[-1]
else:
    image_size = base_size

---

from vllm import LLM, SamplingParams
from PIL import Image

small_image = Image.new("RGB", (400, 300), color="white")

llm = LLM(
    model="deepseek-ai/DeepSeek-OCR",
    hf_overrides={"architectures": ["DeepseekOCRForCausalLM"]},
    dtype="bfloat16",
    max_model_len=4096,
)

# This will crash with EngineDeadError
output = llm.generate(
    [{"prompt": "<image>\nDescribe this image.",
      "multi_modal_data": {"image": small_image}}],
    SamplingParams(temperature=0.0, max_tokens=100),
)

RAW_BUFFERClick to expand / collapse

Your current environment

vLLM version: 0.17.0 (also current main)
GPU: NVIDIA A100
Python: 3.12
CUDA: 13.1

Model

deepseek-ai/DeepSeek-OCR (DeepseekOCRForCausalLM)

🐛 Describe the bug

DeepseekOCRForCausalLM crashes with a fatal EngineDeadError when processing images that do not require cropping (images ≤ 640×640 pixels). The V1 engine dies on the first such request and all subsequent requests fail.

Error:

ValueError: images_crop dim[2] expected 1024, got 640.
Expected shape: ('bnp', 3, 1024, 1024), but got torch.Size([0, 3, 640, 640])

The fix is trivial — remove the numel() > 0 guard since shape[-1] is valid on zero-element tensors:

# Before (buggy):
if images_crop is not None and images_crop.numel() > 0:
    image_size = images_crop.shape[-1]
else:
    image_size = base_size

# After (fixed):
if images_crop is not None:
    image_size = images_crop.shape[-1]
else:
    image_size = base_size

Note: deepseek_ocr2.py avoids this entirely by not binding image_size in resolve_bindings (only binds base_size).

Why only some images crash

Large images (> 640×640): crops are created → numel() > 0 → image_size = 640 → schema validates OK ✓
Small images (≤ 640×640): no crops → numel() == 0 → image_size = 1024 → mismatch with (0, 3, 640, 640) → crash ✗

Once the V1 engine encounters a single crashing request, EngineCore raises EngineDeadError and all subsequent requests fail.

How to reproduce

from vllm import LLM, SamplingParams
from PIL import Image

small_image = Image.new("RGB", (400, 300), color="white")

llm = LLM(
    model="deepseek-ai/DeepSeek-OCR",
    hf_overrides={"architectures": ["DeepseekOCRForCausalLM"]},
    dtype="bfloat16",
    max_model_len=4096,
)

# This will crash with EngineDeadError
output = llm.generate(
    [{"prompt": "<image>\nDescribe this image.",
      "multi_modal_data": {"image": small_image}}],
    SamplingParams(temperature=0.0, max_tokens=100),
)

Before submitting a new issue...

I have searched for similar issues
I have verified the bug on the latest vLLM main branch

extent analysis

Fix Plan

To fix the EngineDeadError crash when processing small images, update the _parse_and_validate_image_input method in deepseek_ocr.py as follows:

Remove the numel() > 0 guard when checking the images_crop tensor.
Use the shape[-1] attribute to get the image size, which is valid even for zero-element tensors.

# Updated code
if images_crop is not None:
    image_size = images_crop.shape[-1]
else:
    image_size = base_size

Verification

To verify the fix, run the provided reproduction code with the updated deepseek_ocr.py file:

from vllm import LLM, SamplingParams
from PIL import Image

small_image = Image.new("RGB", (400, 300), color="white")

llm = LLM(
    model="deepseek-ai/DeepSeek-OCR",
    hf_overrides={"architectures": ["DeepseekOCRForCausalLM"]},
    dtype="bfloat16",
    max_model_len=4096,
)

output = llm.generate(
    [{"prompt": "<image>\nDescribe this image.",
      "multi_modal_data": {"image": small_image}}],
    SamplingParams(temperature=0.0, max_tokens=100),
)

If the fix is successful, the code should run without crashing and produce a valid output.

Extra Tips

Make sure to update the deepseek_ocr.py file with the corrected code.
If you encounter any further issues, try resetting the V1 engine or restarting the application.
Consider adding additional error handling or logging to help diagnose similar issues in the future.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #orchestration issue #cache issue #memory leak #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Bug]: DeepSeek-OCR v1 crashes with TensorSchema mismatch when images_crop is empty (small images ≤640px) [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #36670: [Bugfix][Model] Fix DeepSeek-OCR TensorSchema crash on empty images_crop

Description (problem / solution / changelog)

Summary

Root Cause

Fix

Testing

Related

Changed files

Code Example

Your current environment

Model

🐛 Describe the bug

Why only some images crash

How to reproduce

Before submitting a new issue...

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Bug]: DeepSeek-OCR v1 crashes with TensorSchema mismatch when images_crop is empty (small images ≤640px) [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #36670: [Bugfix][Model] Fix DeepSeek-OCR TensorSchema crash on empty images_crop

Description (problem / solution / changelog)

Summary

Root Cause

Fix

Testing

Related

Changed files

Code Example

Your current environment

Model

🐛 Describe the bug

Why only some images crash

How to reproduce

Before submitting a new issue...

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING