vllm - 💡(How to fix) Fix [Performance]: Request vLLM input: FlashInfer JIT ops not registered as proper torch.ops custom ops, breaking torch.compile(fullgraph=True) — upstream fix in progress at flashinfer#2734 [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#37850Fetched 2026-04-08 01:17:38
View on GitHub
Comments
0
Participants
1
Timeline
7
Reactions
0
Author
Participants
Timeline (top)
mentioned ×2subscribed ×2closed ×1labeled ×1

FlashInfer PR #2734 is fixing torch.compile(fullgraph=True) compatibility for FlashInfer JIT-backed ops. The root cause: register_custom_op / register_fake_op in FlashInfer were stub no-ops, so Dynamo traces into the Python body and hits Path.exists() / os.stat() in the JIT loading path — which Dynamo cannot capture, breaking fullgraph compilation.

Root Cause

Context

FlashInfer PR #2734 is fixing torch.compile(fullgraph=True) compatibility for FlashInfer JIT-backed ops. The root cause: register_custom_op / register_fake_op in FlashInfer were stub no-ops, so Dynamo traces into the Python body and hits Path.exists() / os.stat() in the JIT loading path — which Dynamo cannot capture, breaking fullgraph compilation.

Fix Action

Fix / Workaround

SGLang uses an is_dynamo_compiling() guard as an explicit temporary workaround:

  • Eager → calls FlashInfer JIT directly
  • Under torch.compile → falls back to torch.ops.sgl_kernel.* AOT kernels (proper PyTorch custom ops, opaque to Dynamo)

Code Example

The output of `python collect_env.py`
RAW_BUFFERClick to expand / collapse

Proposal to improve performance

Context

FlashInfer PR #2734 is fixing torch.compile(fullgraph=True) compatibility for FlashInfer JIT-backed ops. The root cause: register_custom_op / register_fake_op in FlashInfer were stub no-ops, so Dynamo traces into the Python body and hits Path.exists() / os.stat() in the JIT loading path — which Dynamo cannot capture, breaking fullgraph compilation.

Current state in SGLang

SGLang uses an is_dynamo_compiling() guard as an explicit temporary workaround:

  • Eager → calls FlashInfer JIT directly
  • Under torch.compile → falls back to torch.ops.sgl_kernel.* AOT kernels (proper PyTorch custom ops, opaque to Dynamo)

Why we need vLLM input

The original performance concern about enabling torch.library.custom_op in FlashInfer was raised by the vLLM team (referenced in the PR). Before the FlashInfer maintainers merge the fix, they want vLLM to weigh in on:

  1. Does vLLM currently hit this same issue with FlashInfer JIT kernels under torch.compile?
  2. Is there any regression risk from enabling proper custom op registration in FlashInfer (the original perf concern)?
  3. Can a vLLM contributor review/approve the direction in flashinfer#2734?

cc @nvpohanh @yzh119

Report of performance regression

No response

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

The output of `python collect_env.py`

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To address the performance issue with FlashInfer and torch.compile, we need to:

  • Enable proper custom op registration in FlashInfer
  • Update SGLang to remove the temporary workaround

Code Changes

# In FlashInfer
def register_custom_op(op):
    # Implement proper registration instead of no-op
    torch.library.custom_op(op)

def register_fake_op(op):
    # Implement proper registration instead of no-op
    torch.library.custom_op(op)
# In SGLang
def is_dynamo_compiling():
    # Remove the temporary workaround
    return False

# Replace calls to FlashInfer JIT with torch.ops.sgl_kernel.*
# with proper custom op calls
def eager_call(op):
    # Call FlashInfer JIT directly
    return op()

def compiled_call(op):
    # Call torch.ops.sgl_kernel.* AOT kernels
    return torch.ops.sgl_kernel.op()

Verification

To verify the fix, test the performance of FlashInfer with torch.compile and ensure that it no longer hits the issue with Path.exists() / os.stat().

Extra Tips

  • Ensure that the custom op registration in FlashInfer is correct and does not introduce any regressions.
  • Test the performance of SGLang with the updated code to ensure that it works as expected.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Performance]: Request vLLM input: FlashInfer JIT ops not registered as proper torch.ops custom ops, breaking torch.compile(fullgraph=True) — upstream fix in progress at flashinfer#2734 [1 participants]