vllm - ✅(Solved) Fix [Feature]: Support direct binary/multipart file upload for video and image in OpenAI-compatible API [1 pull requests, 2 comments, 3 participants]

harshvb20 · 2026-03-30T09:15:50Z

[vllm] Allow clients to upload video/image files directly via the API multipart or binary , similar to how SGLang and LMDeploy handle local media, instead of r… Allow clients to upload video/image files directly via the API (multipart or binary), similar to how SGLang and LMDeploy handle local media, instead of requiring base64 encoding or a reachable URL. # PR #39003: [Frontend] Add /v1/files upload endpoint for multimodal inputs (#38531) - Repository: vllm-project/vllm - Author: Alberto-Codes - State: open | merged: False - Link: https://github.com/vllm-project/vllm/pull/39003 ## Description (problem / solution / changelog) ## Purpose Implements the `/v1/files` upload endpoint requested in **RFC issue #38531**. Lets clients upload multimodal files once (via multipart form) and reference them in subsequent chat completions through a new `vllm-file:// ` URL scheme — an alternative to base64 data URLs (which inflate payloads and can exceed shell `ARG_MAX` for videos) and to `file://` URLs (which require the file to already exist on the server machine). The endpoint is **off by default**; operators opt in with `--enable-file-uploads`. Closes #38531. ### Answers to @DarkLight1337's questions > **Lifecycle:** `--file-upload-ttl-seconds` (default 1h, atime-based > expiry) + `--file-upload-max-total-gb` quota with LRU eviction > (default 5 GB). Upload dir is cleared on server startup, so no state > persists across restarts. Explicit `DELETE /v1/files/{id}` also > works. Opportunistic sweeper runs inside `create_file` so no > background task is needed. > **Security:** Off-by-default + 128-bit capability handles > (`file- `) + MIME magic-byte allowlist (video/image/audio > only) + path confinement (sha256 on-disk names, client filename is > metadata only, 255-byte cap, control chars stripped) + streaming > size enforcement (no memory spikes) + structured JSON audit log + > optional `--file-upload-scope-header` for gateway-fronted > deployments + optional `--file-upload-disable-listing` to remove > the enumeration surface. ### Design decisions (consensus during implementation) | # | Question | Decision | |---|---|---| | 1 | Scoping model | Server-global by default. Opt-in `--file-upload-scope-header` for gateway-fronted deployments. Scope mismatches return 404, not 403 (capability non-disclosure). | | 2 | `purpose` enum | `{"vision", "user_data"}`. Rejects OpenAI-specific values (`assistants`, `batch`, `fine-tune`) with 400. | | 3 | `expires_at` | Default TTL 3600s. `--file-upload-ttl-seconds=-1` disables time-based expiry; `expires_at` is omitted from responses in that mode. Quota LRU still applies. | | 4 | Streaming download | Always `StreamingResponse` + 64 KiB chunks. No size-based branching. | | 5 | MIME validation | Inline magic-byte sniffer (no new required dependency). Optionally uses `python-magic` if installed for broader detection. | | 6 | Config pattern | New `FileUploadConfig` `@config` dataclass, matching vLLM's `CacheConfig`/`LoRAConfig` subsystem pattern. | ### Trust model for `vllm-file://` resolution `vllm-file:// ` URLs resolving through `MediaConnector` are **capability-based**: possession of the 128-bit file ID is the access control. Scope headers are enforced on the `/v1/files` CRUD endpoints (which could leak IDs via LIST) but **not** on multimodal resolution — the chat-completion layer does not receive request headers in the current architecture. Every resolution emits a `file.resolve` audit log line, so access remains traceable. If stricter semantics are desired for a specific deployment, a follow-up `--file-upload-strict-scope` flag can thread the header through `MediaConnector` (requires ~15 lines of changes to `chat_utils.py`). ### Deployment patterns Works out of the box with the existing OpenAI SDK (`project=` maps to `OpenAI-Project` header) and standard gateways: | Deployment | `--file-upload-scope-header` | Notes | | --- | --- | --- | | Direct OpenAI SDK client | `OpenAI-Project` | SDK auto-sends from `OPENAI_PROJECT_ID` env or `project=...` client param | | Apigee (AssignMessage) | `OpenAI-Project` or `X-Consumer-ID` | One-line policy | | Kong | `X-Consumer-ID` | Native on authenticated routes | | Envoy / oauth2-proxy | `X-Auth-Request-User` | JWT `sub` claim | Docs added in this PR: - `docs/features/multimodal_inputs.md` — new **"Uploading Local Media Files"** subsection under Online Serving Video Inputs, with OpenAI SDK + curl examples. - `docs/serving/openai_compatible_server.md` — new **"Files API"** section with the endpoint table, all `--file-upload-*` flags with defaults, the four gateway deployment patterns, and the full security posture. - `examples/online_serving/openai_file_upload_client.py` — minimal runnable example (upload → reference via `vllm-file:// ` → chat completion → delete). --- ## Test Plan **118 unit + integration tests** across 6 files under `tests/entrypoints/openai/files/`: ```bash pytest tests/entrypoints/openai/files/ -v ``` Covers: - `Fi

vllm2026-03-30 09:15:50

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#38531•Fetched 2026-04-08 01:53:29

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

referenced ×11mentioned ×5subscribed ×5commented ×2

Allow clients to upload video/image files directly via the API (multipart or binary), similar to how SGLang and LMDeploy handle local media, instead of requiring base64 encoding or a reachable URL.

Root Cause

SGLang and LMDeploy already support local file paths natively
Makes local development and testing much easier
Avoids memory/shell limits from base64 encoding large video files
Especially painful for multimodal workflows with large video files

PR fix notes

PR #39003: [Frontend] Add /v1/files upload endpoint for multimodal inputs (#38531)

Repository: vllm-project/vllm
Author: Alberto-Codes
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/39003

Description (problem / solution / changelog)

Purpose

Implements the /v1/files upload endpoint requested in RFC issue #38531. Lets clients upload multimodal files once (via multipart form) and reference them in subsequent chat completions through a new vllm-file://<id> URL scheme — an alternative to base64 data URLs (which inflate payloads and can exceed shell ARG_MAX for videos) and to file:// URLs (which require the file to already exist on the server machine). The endpoint is off by default; operators opt in with --enable-file-uploads.

Closes #38531.

Answers to @DarkLight1337's questions

Lifecycle: --file-upload-ttl-seconds (default 1h, atime-based expiry) + --file-upload-max-total-gb quota with LRU eviction (default 5 GB). Upload dir is cleared on server startup, so no state persists across restarts. Explicit DELETE /v1/files/{id} also works. Opportunistic sweeper runs inside create_file so no background task is needed.

Security: Off-by-default + 128-bit capability handles (file-<32 hex>) + MIME magic-byte allowlist (video/image/audio only) + path confinement (sha256 on-disk names, client filename is metadata only, 255-byte cap, control chars stripped) + streaming size enforcement (no memory spikes) + structured JSON audit log + optional --file-upload-scope-header for gateway-fronted deployments + optional --file-upload-disable-listing to remove the enumeration surface.

Design decisions (consensus during implementation)

#	Question	Decision
1	Scoping model	Server-global by default. Opt-in `--file-upload-scope-header` for gateway-fronted deployments. Scope mismatches return 404, not 403 (capability non-disclosure).
2	`purpose` enum	`{"vision", "user_data"}`. Rejects OpenAI-specific values (`assistants`, `batch`, `fine-tune`) with 400.
3	`expires_at`	Default TTL 3600s. `--file-upload-ttl-seconds=-1` disables time-based expiry; `expires_at` is omitted from responses in that mode. Quota LRU still applies.
4	Streaming download	Always `StreamingResponse` + 64 KiB chunks. No size-based branching.
5	MIME validation	Inline magic-byte sniffer (no new required dependency). Optionally uses `python-magic` if installed for broader detection.
6	Config pattern	New `FileUploadConfig` `@config` dataclass, matching vLLM's `CacheConfig`/`LoRAConfig` subsystem pattern.

Trust model for `vllm-file://` resolution

vllm-file://<id> URLs resolving through MediaConnector are capability-based: possession of the 128-bit file ID is the access control. Scope headers are enforced on the /v1/files CRUD endpoints (which could leak IDs via LIST) but not on multimodal resolution — the chat-completion layer does not receive request headers in the current architecture. Every resolution emits a file.resolve audit log line, so access remains traceable.

If stricter semantics are desired for a specific deployment, a follow-up --file-upload-strict-scope flag can thread the header through MediaConnector (requires ~15 lines of changes to chat_utils.py).

Deployment patterns

Works out of the box with the existing OpenAI SDK (project= maps to OpenAI-Project header) and standard gateways:

Deployment	`--file-upload-scope-header`	Notes
Direct OpenAI SDK client	`OpenAI-Project`	SDK auto-sends from `OPENAI_PROJECT_ID` env or `project=...` client param
Apigee (AssignMessage)	`OpenAI-Project` or `X-Consumer-ID`	One-line policy
Kong	`X-Consumer-ID`	Native on authenticated routes
Envoy / oauth2-proxy	`X-Auth-Request-User`	JWT `sub` claim

Docs added in this PR:

docs/features/multimodal_inputs.md — new "Uploading Local Media Files" subsection under Online Serving Video Inputs, with OpenAI SDK + curl examples.
docs/serving/openai_compatible_server.md — new "Files API" section with the endpoint table, all --file-upload-* flags with defaults, the four gateway deployment patterns, and the full security posture.
examples/online_serving/openai_file_upload_client.py — minimal runnable example (upload → reference via vllm-file://<id> → chat completion → delete).

Test Plan

118 unit + integration tests across 6 files under tests/entrypoints/openai/files/:

pytest tests/entrypoints/openai/files/ -v

Covers:

FileUploadConfig validation (defaults, size constraints, extra-field rejection)
MIME magic-byte sniffer (8 supported formats + ELF/PE/HTML/script rejection, pymagic fallback path)
Store behaviour: streaming upload, 128-bit IDs, path confinement, filename sanitisation, LRU quota eviction, TTL sweep, scope non-disclosure, audit-log schema
Pydantic protocol models (purpose allowlist, expires_at omission)
Router via FastAPI TestClient: full round-trip, scope header enforcement (missing/present/mismatch), disable_listing, purpose validation, X-Request-Id propagation, feature-off 404 contract, concurrency-limit → 503
MediaConnector vllm-file:// scheme dispatch (sync + async, scope bypass, unknown-id/unregistered-store errors, atime touching)
Concurrency + TOCTOU races: fail-fast semaphore rejection, concurrent-eviction handling in read_bytes_by_id/read_bytes_by_id_async, eager open in stream_content, eviction-vs-reads invariant, startup refusal to wipe unmarked user directory

GPU end-to-end (operator-runnable via the example client added in this PR):

# Launch server with the feature enabled
vllm serve allenai/Molmo2-8B --trust-remote-code --max-model-len 6144 \
    --enable-file-uploads --file-upload-max-size-mb 128

# Upload + chat-completion round-trip against a real video
python examples/online_serving/openai_file_upload_client.py path/to/clip.mp4

Test Result

All 118 tests pass (CPU-only, ~25s on a developer laptop):

118 passed, 2 warnings in 28.40s

GPU end-to-end on RTX 4090 (24 GB) with Molmo2-8B:

POST /v1/files (24.5 MiB Seinfeld clip, multipart form)  → 200  149 ms
POST /v1/chat/completions  (video_url=vllm-file://...)   → 200  7421 ms
    2784 prompt tokens, 87 completion tokens
    Model output: "This scene depicts a group of friends walking
    together on a busy city street. The setting is a typical urban
    environment with storefronts, parked cars, and pedestrians going
    about their day. The characters are engaged in conversation as
    they stroll along the sidewalk, with one woman gesturing
    animatedly while the others listen and respond..."
DELETE /v1/files/{id}  → 200  deleted:true

This is the exact failure case from #38531 (24 MB video → base64 → ARG_MAX exceeded), now a 149 ms multipart upload.

Self-Review Hardening Pass (post-AI-reviewer feedback)

After addressing the gemini-code-assist and copilot-pull-request-reviewer comments, a cross-ecosystem sister-review surfaced 20+ additional items beyond what the AI reviewers caught. Fixed in 7 commits on top of the original feature:

Commit	Category
`9c7e3c2f9`	Dead code removal (unwired sweeper matching PR body's "no background task" design)
`b25ce81b0`	Security: 0o700 dir / 0o600 files / `tempfile.mkdtemp` / PID lockfile / multi-API-server startup guard
`c484a6723`	Async correctness: `media_io.load_bytes` → thread pool, executor-dispatched unlinks + close, async `init_files_state`
`ec3c32ccd`	Protocol/audit: stable `expires_at`, scope trim+cap, MP3 bit validation, HEIC/AVIF brands, URL regex guard
`529d9c582`	+8 tests covering the new security and MIME behaviours
`3ba03d6eb` + `aea74571b`	Example expanded to demo full CRUD + multi-turn reuse

Per-commit details in the commit messages.

GPU re-validation on RTX 4090 with Molmo2-8B (patched vllm-openai:latest container carrying all 7 commits):

POST /v1/files  (24.5 MiB Seinfeld clip)         → 200   66 ms
POST /v1/chat/completions  (vllm-file://...)     → 200 3974 ms
DELETE /v1/files/{id}                            → 200

expires_at stable across 3 accesses: 1775411599 → 1775411599 → 1775411599
malformed vllm-file://../etc/passwd rejected with clear error

AI-Assisted Contribution Disclosure

This PR was developed with assistance from Claude (Anthropic) per vLLM's AI Assisted Contributions guide. The human author reviewed all code changes, ran all 118 tests locally, and validated the GPU integration end-to-end against Molmo2-8B on a 24 MiB video clip. Commits carry Co-authored-by: Claude trailers as documented in the guide.

<details> <summary>Essential Elements of an Effective PR Description Checklist</summary>

The purpose of the PR, and link to existing issue (#38531)
The test plan, including the pytest command
The test results (all 118 tests passing + GPU end-to-end output)
Documentation update (user guide + API reference + example client)
Release notes update (will add once this PR is close to merge)

</details>

Changed files

docs/features/multimodal_inputs.md (modified, +92/-0)
docs/serving/openai_compatible_server.md (modified, +84/-0)
examples/online_serving/openai_file_upload_client.py (added, +176/-0)
tests/entrypoints/openai/files/__init__.py (added, +2/-0)
tests/entrypoints/openai/files/test_api_router.py (added, +304/-0)
tests/entrypoints/openai/files/test_config.py (added, +69/-0)
tests/entrypoints/openai/files/test_media_connector_integration.py (added, +123/-0)
tests/entrypoints/openai/files/test_mime.py (added, +182/-0)
tests/entrypoints/openai/files/test_protocol.py (added, +104/-0)
tests/entrypoints/openai/files/test_store.py (added, +621/-0)
vllm/config/__init__.py (modified, +3/-0)
vllm/config/file_upload.py (added, +71/-0)
vllm/entrypoints/openai/api_server.py (modified, +12/-0)
vllm/entrypoints/openai/cli_args.py (modified, +39/-1)
vllm/entrypoints/openai/files/__init__.py (added, +2/-0)
vllm/entrypoints/openai/files/api_router.py (added, +243/-0)
vllm/entrypoints/openai/files/mime.py (added, +242/-0)
vllm/entrypoints/openai/files/protocol.py (added, +62/-0)
vllm/entrypoints/openai/files/serving.py (added, +295/-0)
vllm/entrypoints/openai/files/store.py (added, +831/-0)
vllm/multimodal/media/connector.py (modified, +94/-2)

Code Example

curl http://localhost:8000/v1/chat/completions \
  -F "file=@/local/video.mp4" \
  -F 'payload={"model":"...","messages":[...]}'

---

# Upload file first, get a reference ID
curl http://localhost:8000/v1/files -F "[email protected]"
# {"id": "file-abc123"}

# Use reference ID in request
{"type": "video_url", "video_url": {"url": "file-abc123"}}

RAW_BUFFERClick to expand / collapse

🚀 The feature, motivation and pitch

Feature Request

Summary

Allow clients to upload video/image files directly via the API (multipart or binary), similar to how SGLang and LMDeploy handle local media, instead of requiring base64 encoding or a reachable URL.

Current Behavior

The /v1/chat/completions endpoint only accepts media via:

Public/accessible URL (video_url.url = "https://...")
Base64 data URI (video_url.url = "data:video/mp4;base64,...")
file:// URI (requires --allowed-local-media-path server flag + file must be on server)

Problem

Base64 encoding 18MB video → ~24MB payload → exceeds shell ARG_MAX (Argument list too long)
file:// requires server restart with --allowed-local-media-path and file must exist on the server machine, not the client
Forces users to run a separate HTTP server just to pass local files

Proposed Solution

Support a /v1/files upload endpoint or multipart form upload in /v1/chat/completions so clients can send raw binary files directly:

curl http://localhost:8000/v1/chat/completions \
  -F "file=@/local/video.mp4" \
  -F 'payload={"model":"...","messages":[...]}'

Or alternatively, a pre-upload endpoint:

# Upload file first, get a reference ID
curl http://localhost:8000/v1/files -F "[email protected]"
# {"id": "file-abc123"}

# Use reference ID in request
{"type": "video_url", "video_url": {"url": "file-abc123"}}

Why This Matters

SGLang and LMDeploy already support local file paths natively
Makes local development and testing much easier
Avoids memory/shell limits from base64 encoding large video files
Especially painful for multimodal workflows with large video files

Environment

vLLM version: latest
Model: Qwen3.5-35B-A3B-FP8 (multimodal MoE)
Use case: local video inference via OpenAI-compatible API

Alternatives

No response

Additional context

No response

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To support direct video/image file uploads via the API, we will implement a multipart form upload in the /v1/chat/completions endpoint.

Step-by-Step Solution

Update the /v1/chat/completions endpoint to accept multipart/form-data requests.
Use a library like multer to handle multipart requests in Node.js.
Store the uploaded file temporarily and generate a reference ID.
Allow clients to use the reference ID in the request payload.

Example Code

const express = require('express');
const multer = require('multer');
const app = express();

const upload = multer({ dest: './uploads/' });

app.post('/v1/chat/completions', upload.single('file'), (req, res) => {
  const file = req.file;
  const payload = req.body;
  const referenceId = generateReferenceId(file);

  // Process the request with the uploaded file and reference ID
  // ...

  res.json({ referenceId });
});

// Generate a unique reference ID for the uploaded file
function generateReferenceId(file) {
  return `file-${crypto.randomUUID()}`;
}

Verification

To verify that the fix worked, test the updated endpoint using a tool like curl:

curl http://localhost:8000/v1/chat/completions \
  -F "file=@/local/video.mp4" \
  -F 'payload={"model":"...","messages":[...]}'

Check that the response contains a valid reference ID and that the uploaded file is processed correctly.

Extra Tips

Make sure to handle errors and edge cases, such as large file uploads and invalid request payloads.
Consider implementing a mechanism to clean up temporary uploaded files to prevent storage issues.
Update the API documentation to reflect the changes and provide examples for clients to use the new endpoint.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #docker error #permission error #memory optimization #batch processing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.