vllm - 💡(How to fix) Fix [Performance]: Request vLLM input: FlashInfer JIT ops not registered as proper torch.ops custom ops, breaking torch.compile(fullgraph=True) — upstream fix in progress at flashinfer#2734 [1 participants]

vllm2026-03-23 04:36:09

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#37850•Fetched 2026-04-08 01:17:38

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Johnsonms

Participants

Johnsonms

Timeline (top)

mentioned ×2subscribed ×2closed ×1labeled ×1

FlashInfer PR #2734 is fixing torch.compile(fullgraph=True) compatibility for FlashInfer JIT-backed ops. The root cause: register_custom_op / register_fake_op in FlashInfer were stub no-ops, so Dynamo traces into the Python body and hits Path.exists() / os.stat() in the JIT loading path — which Dynamo cannot capture, breaking fullgraph compilation.

Root Cause

Context

Fix Action

Fix / Workaround

SGLang uses an is_dynamo_compiling() guard as an explicit temporary workaround:

Eager → calls FlashInfer JIT directly
Under torch.compile → falls back to torch.ops.sgl_kernel.* AOT kernels (proper PyTorch custom ops, opaque to Dynamo)

Code Example

The output of `python collect_env.py`

RAW_BUFFERClick to expand / collapse

Proposal to improve performance

Context

Current state in SGLang

SGLang uses an is_dynamo_compiling() guard as an explicit temporary workaround:

Eager → calls FlashInfer JIT directly
Under torch.compile → falls back to torch.ops.sgl_kernel.* AOT kernels (proper PyTorch custom ops, opaque to Dynamo)

Why we need vLLM input

The original performance concern about enabling torch.library.custom_op in FlashInfer was raised by the vLLM team (referenced in the PR). Before the FlashInfer maintainers merge the fix, they want vLLM to weigh in on:

Does vLLM currently hit this same issue with FlashInfer JIT kernels under torch.compile?
Is there any regression risk from enabling proper custom op registration in FlashInfer (the original perf concern)?
Can a vLLM contributor review/approve the direction in flashinfer#2734?

cc @nvpohanh @yzh119

Report of performance regression

No response

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

The output of `python collect_env.py`

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To address the performance issue with FlashInfer and torch.compile, we need to:

Enable proper custom op registration in FlashInfer
Update SGLang to remove the temporary workaround

Code Changes

# In FlashInfer
def register_custom_op(op):
    # Implement proper registration instead of no-op
    torch.library.custom_op(op)

def register_fake_op(op):
    # Implement proper registration instead of no-op
    torch.library.custom_op(op)

# In SGLang
def is_dynamo_compiling():
    # Remove the temporary workaround
    return False

# Replace calls to FlashInfer JIT with torch.ops.sgl_kernel.*
# with proper custom op calls
def eager_call(op):
    # Call FlashInfer JIT directly
    return op()

def compiled_call(op):
    # Call torch.ops.sgl_kernel.* AOT kernels
    return torch.ops.sgl_kernel.op()

Verification

To verify the fix, test the performance of FlashInfer with torch.compile and ensure that it no longer hits the issue with Path.exists() / os.stat().

Extra Tips

Ensure that the custom op registration in FlashInfer is correct and does not introduce any regressions.
Test the performance of SGLang with the updated code to ensure that it works as expected.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#tensor shape #autograd error #model save/load #optimization #mixed precision

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Performance]: Request vLLM input: FlashInfer JIT ops not registered as proper torch.ops custom ops, breaking torch.compile(fullgraph=True) — upstream fix in progress at flashinfer#2734 [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Context

Fix Action

Fix / Workaround

Code Example

Proposal to improve performance

Context

Current state in SGLang

Why we need vLLM input

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

extent analysis

Fix Plan

Code Changes

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Performance]: Request vLLM input: FlashInfer JIT ops not registered as proper torch.ops custom ops, breaking torch.compile(fullgraph=True) — upstream fix in progress at flashinfer#2734 [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Context

Fix Action

Fix / Workaround

Code Example

Proposal to improve performance

Context

Current state in SGLang

Why we need vLLM input

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

extent analysis

Fix Plan

Code Changes

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING