vllm - ✅(Solved) Fix [Feature]: Support direct binary/multipart file upload for video and image in OpenAI-compatible API [1 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38531Fetched 2026-04-08 01:53:29
View on GitHub
Comments
2
Participants
3
Timeline
26
Reactions
0
Author
Timeline (top)
referenced ×11mentioned ×5subscribed ×5commented ×2

Allow clients to upload video/image files directly via the API (multipart or binary), similar to how SGLang and LMDeploy handle local media, instead of requiring base64 encoding or a reachable URL.

Root Cause

  • SGLang and LMDeploy already support local file paths natively
  • Makes local development and testing much easier
  • Avoids memory/shell limits from base64 encoding large video files
  • Especially painful for multimodal workflows with large video files

PR fix notes

PR #39003: [Frontend] Add /v1/files upload endpoint for multimodal inputs (#38531)

Description (problem / solution / changelog)

Purpose

Implements the /v1/files upload endpoint requested in RFC issue #38531. Lets clients upload multimodal files once (via multipart form) and reference them in subsequent chat completions through a new vllm-file://<id> URL scheme — an alternative to base64 data URLs (which inflate payloads and can exceed shell ARG_MAX for videos) and to file:// URLs (which require the file to already exist on the server machine). The endpoint is off by default; operators opt in with --enable-file-uploads.

Closes #38531.

Answers to @DarkLight1337's questions

Lifecycle: --file-upload-ttl-seconds (default 1h, atime-based expiry) + --file-upload-max-total-gb quota with LRU eviction (default 5 GB). Upload dir is cleared on server startup, so no state persists across restarts. Explicit DELETE /v1/files/{id} also works. Opportunistic sweeper runs inside create_file so no background task is needed.

Security: Off-by-default + 128-bit capability handles (file-<32 hex>) + MIME magic-byte allowlist (video/image/audio only) + path confinement (sha256 on-disk names, client filename is metadata only, 255-byte cap, control chars stripped) + streaming size enforcement (no memory spikes) + structured JSON audit log + optional --file-upload-scope-header for gateway-fronted deployments + optional --file-upload-disable-listing to remove the enumeration surface.

Design decisions (consensus during implementation)

#QuestionDecision
1Scoping modelServer-global by default. Opt-in --file-upload-scope-header for gateway-fronted deployments. Scope mismatches return 404, not 403 (capability non-disclosure).
2purpose enum{"vision", "user_data"}. Rejects OpenAI-specific values (assistants, batch, fine-tune) with 400.
3expires_atDefault TTL 3600s. --file-upload-ttl-seconds=-1 disables time-based expiry; expires_at is omitted from responses in that mode. Quota LRU still applies.
4Streaming downloadAlways StreamingResponse + 64 KiB chunks. No size-based branching.
5MIME validationInline magic-byte sniffer (no new required dependency). Optionally uses python-magic if installed for broader detection.
6Config patternNew FileUploadConfig @config dataclass, matching vLLM's CacheConfig/LoRAConfig subsystem pattern.

Trust model for vllm-file:// resolution

vllm-file://<id> URLs resolving through MediaConnector are capability-based: possession of the 128-bit file ID is the access control. Scope headers are enforced on the /v1/files CRUD endpoints (which could leak IDs via LIST) but not on multimodal resolution — the chat-completion layer does not receive request headers in the current architecture. Every resolution emits a file.resolve audit log line, so access remains traceable.

If stricter semantics are desired for a specific deployment, a follow-up --file-upload-strict-scope flag can thread the header through MediaConnector (requires ~15 lines of changes to chat_utils.py).

Deployment patterns

Works out of the box with the existing OpenAI SDK (project= maps to OpenAI-Project header) and standard gateways:

Deployment--file-upload-scope-headerNotes
Direct OpenAI SDK clientOpenAI-ProjectSDK auto-sends from OPENAI_PROJECT_ID env or project=... client param
Apigee (AssignMessage)OpenAI-Project or X-Consumer-IDOne-line policy
KongX-Consumer-IDNative on authenticated routes
Envoy / oauth2-proxyX-Auth-Request-UserJWT sub claim

Docs added in this PR:

  • docs/features/multimodal_inputs.md — new "Uploading Local Media Files" subsection under Online Serving Video Inputs, with OpenAI SDK + curl examples.
  • docs/serving/openai_compatible_server.md — new "Files API" section with the endpoint table, all --file-upload-* flags with defaults, the four gateway deployment patterns, and the full security posture.
  • examples/online_serving/openai_file_upload_client.py — minimal runnable example (upload → reference via vllm-file://<id> → chat completion → delete).

Test Plan

118 unit + integration tests across 6 files under tests/entrypoints/openai/files/:

pytest tests/entrypoints/openai/files/ -v

Covers:

  • FileUploadConfig validation (defaults, size constraints, extra-field rejection)
  • MIME magic-byte sniffer (8 supported formats + ELF/PE/HTML/script rejection, pymagic fallback path)
  • Store behaviour: streaming upload, 128-bit IDs, path confinement, filename sanitisation, LRU quota eviction, TTL sweep, scope non-disclosure, audit-log schema
  • Pydantic protocol models (purpose allowlist, expires_at omission)
  • Router via FastAPI TestClient: full round-trip, scope header enforcement (missing/present/mismatch), disable_listing, purpose validation, X-Request-Id propagation, feature-off 404 contract, concurrency-limit → 503
  • MediaConnector vllm-file:// scheme dispatch (sync + async, scope bypass, unknown-id/unregistered-store errors, atime touching)
  • Concurrency + TOCTOU races: fail-fast semaphore rejection, concurrent-eviction handling in read_bytes_by_id/read_bytes_by_id_async, eager open in stream_content, eviction-vs-reads invariant, startup refusal to wipe unmarked user directory

GPU end-to-end (operator-runnable via the example client added in this PR):

# Launch server with the feature enabled
vllm serve allenai/Molmo2-8B --trust-remote-code --max-model-len 6144 \
    --enable-file-uploads --file-upload-max-size-mb 128

# Upload + chat-completion round-trip against a real video
python examples/online_serving/openai_file_upload_client.py path/to/clip.mp4

Test Result

All 118 tests pass (CPU-only, ~25s on a developer laptop):

118 passed, 2 warnings in 28.40s

GPU end-to-end on RTX 4090 (24 GB) with Molmo2-8B:

POST /v1/files (24.5 MiB Seinfeld clip, multipart form)  → 200  149 ms
POST /v1/chat/completions  (video_url=vllm-file://...)   → 200  7421 ms
    2784 prompt tokens, 87 completion tokens
    Model output: "This scene depicts a group of friends walking
    together on a busy city street. The setting is a typical urban
    environment with storefronts, parked cars, and pedestrians going
    about their day. The characters are engaged in conversation as
    they stroll along the sidewalk, with one woman gesturing
    animatedly while the others listen and respond..."
DELETE /v1/files/{id}  → 200  deleted:true

This is the exact failure case from #38531 (24 MB video → base64 → ARG_MAX exceeded), now a 149 ms multipart upload.


Self-Review Hardening Pass (post-AI-reviewer feedback)

After addressing the gemini-code-assist and copilot-pull-request-reviewer comments, a cross-ecosystem sister-review surfaced 20+ additional items beyond what the AI reviewers caught. Fixed in 7 commits on top of the original feature:

CommitCategory
9c7e3c2f9Dead code removal (unwired sweeper matching PR body's "no background task" design)
b25ce81b0Security: 0o700 dir / 0o600 files / tempfile.mkdtemp / PID lockfile / multi-API-server startup guard
c484a6723Async correctness: media_io.load_bytes → thread pool, executor-dispatched unlinks + close, async init_files_state
ec3c32ccdProtocol/audit: stable expires_at, scope trim+cap, MP3 bit validation, HEIC/AVIF brands, URL regex guard
529d9c582+8 tests covering the new security and MIME behaviours
3ba03d6eb + aea74571bExample expanded to demo full CRUD + multi-turn reuse

Per-commit details in the commit messages.

GPU re-validation on RTX 4090 with Molmo2-8B (patched vllm-openai:latest container carrying all 7 commits):

POST /v1/files  (24.5 MiB Seinfeld clip)         → 200   66 ms
POST /v1/chat/completions  (vllm-file://...)     → 200 3974 ms
DELETE /v1/files/{id}                            → 200

expires_at stable across 3 accesses: 1775411599 → 1775411599 → 1775411599
malformed vllm-file://../etc/passwd rejected with clear error

AI-Assisted Contribution Disclosure

This PR was developed with assistance from Claude (Anthropic) per vLLM's AI Assisted Contributions guide. The human author reviewed all code changes, ran all 118 tests locally, and validated the GPU integration end-to-end against Molmo2-8B on a 24 MiB video clip. Commits carry Co-authored-by: Claude trailers as documented in the guide.


<details> <summary>Essential Elements of an Effective PR Description Checklist</summary>
  • The purpose of the PR, and link to existing issue (#38531)
  • The test plan, including the pytest command
  • The test results (all 118 tests passing + GPU end-to-end output)
  • Documentation update (user guide + API reference + example client)
  • Release notes update (will add once this PR is close to merge)
</details>

Changed files

  • docs/features/multimodal_inputs.md (modified, +92/-0)
  • docs/serving/openai_compatible_server.md (modified, +84/-0)
  • examples/online_serving/openai_file_upload_client.py (added, +176/-0)
  • tests/entrypoints/openai/files/__init__.py (added, +2/-0)
  • tests/entrypoints/openai/files/test_api_router.py (added, +304/-0)
  • tests/entrypoints/openai/files/test_config.py (added, +69/-0)
  • tests/entrypoints/openai/files/test_media_connector_integration.py (added, +123/-0)
  • tests/entrypoints/openai/files/test_mime.py (added, +182/-0)
  • tests/entrypoints/openai/files/test_protocol.py (added, +104/-0)
  • tests/entrypoints/openai/files/test_store.py (added, +621/-0)
  • vllm/config/__init__.py (modified, +3/-0)
  • vllm/config/file_upload.py (added, +71/-0)
  • vllm/entrypoints/openai/api_server.py (modified, +12/-0)
  • vllm/entrypoints/openai/cli_args.py (modified, +39/-1)
  • vllm/entrypoints/openai/files/__init__.py (added, +2/-0)
  • vllm/entrypoints/openai/files/api_router.py (added, +243/-0)
  • vllm/entrypoints/openai/files/mime.py (added, +242/-0)
  • vllm/entrypoints/openai/files/protocol.py (added, +62/-0)
  • vllm/entrypoints/openai/files/serving.py (added, +295/-0)
  • vllm/entrypoints/openai/files/store.py (added, +831/-0)
  • vllm/multimodal/media/connector.py (modified, +94/-2)

Code Example

curl http://localhost:8000/v1/chat/completions \
  -F "file=@/local/video.mp4" \
  -F 'payload={"model":"...","messages":[...]}'

---

# Upload file first, get a reference ID
curl http://localhost:8000/v1/files -F "[email protected]"
# {"id": "file-abc123"}

# Use reference ID in request
{"type": "video_url", "video_url": {"url": "file-abc123"}}
RAW_BUFFERClick to expand / collapse

🚀 The feature, motivation and pitch

Feature Request

Summary

Allow clients to upload video/image files directly via the API (multipart or binary), similar to how SGLang and LMDeploy handle local media, instead of requiring base64 encoding or a reachable URL.

Current Behavior

The /v1/chat/completions endpoint only accepts media via:

  • Public/accessible URL (video_url.url = "https://...")
  • Base64 data URI (video_url.url = "data:video/mp4;base64,...")
  • file:// URI (requires --allowed-local-media-path server flag + file must be on server)

Problem

  • Base64 encoding 18MB video → ~24MB payload → exceeds shell ARG_MAX (Argument list too long)
  • file:// requires server restart with --allowed-local-media-path and file must exist on the server machine, not the client
  • Forces users to run a separate HTTP server just to pass local files

Proposed Solution

Support a /v1/files upload endpoint or multipart form upload in /v1/chat/completions so clients can send raw binary files directly:

curl http://localhost:8000/v1/chat/completions \
  -F "file=@/local/video.mp4" \
  -F 'payload={"model":"...","messages":[...]}'

Or alternatively, a pre-upload endpoint:

# Upload file first, get a reference ID
curl http://localhost:8000/v1/files -F "[email protected]"
# {"id": "file-abc123"}

# Use reference ID in request
{"type": "video_url", "video_url": {"url": "file-abc123"}}

Why This Matters

  • SGLang and LMDeploy already support local file paths natively
  • Makes local development and testing much easier
  • Avoids memory/shell limits from base64 encoding large video files
  • Especially painful for multimodal workflows with large video files

Environment

  • vLLM version: latest
  • Model: Qwen3.5-35B-A3B-FP8 (multimodal MoE)
  • Use case: local video inference via OpenAI-compatible API

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To support direct video/image file uploads via the API, we will implement a multipart form upload in the /v1/chat/completions endpoint.

Step-by-Step Solution

  • Update the /v1/chat/completions endpoint to accept multipart/form-data requests.
  • Use a library like multer to handle multipart requests in Node.js.
  • Store the uploaded file temporarily and generate a reference ID.
  • Allow clients to use the reference ID in the request payload.

Example Code

const express = require('express');
const multer = require('multer');
const app = express();

const upload = multer({ dest: './uploads/' });

app.post('/v1/chat/completions', upload.single('file'), (req, res) => {
  const file = req.file;
  const payload = req.body;
  const referenceId = generateReferenceId(file);

  // Process the request with the uploaded file and reference ID
  // ...

  res.json({ referenceId });
});

// Generate a unique reference ID for the uploaded file
function generateReferenceId(file) {
  return `file-${crypto.randomUUID()}`;
}

Verification

To verify that the fix worked, test the updated endpoint using a tool like curl:

curl http://localhost:8000/v1/chat/completions \
  -F "file=@/local/video.mp4" \
  -F 'payload={"model":"...","messages":[...]}'

Check that the response contains a valid reference ID and that the uploaded file is processed correctly.

Extra Tips

  • Make sure to handle errors and edge cases, such as large file uploads and invalid request payloads.
  • Consider implementing a mechanism to clean up temporary uploaded files to prevent storage issues.
  • Update the API documentation to reflect the changes and provide examples for clients to use the new endpoint.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING