vllm - 💡(How to fix) Fix [RFC]: Refactor PassManager infrastructure [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#40953Fetched 2026-04-28 06:26:15
View on GitHub
Comments
1
Participants
1
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
mentioned ×2subscribed ×2commented ×1labeled ×1

Code Example

# Form 1: Python class
from vllm.config import CompilationConfig
from vllm.compilation.passes.fusion.qk_norm_rope_fusion import QKNormRoPEFusionPass

compilation_config = CompilationConfig(
    pass_pipeline=[
        QKNormRoPEFusionPass,
    ],
)

---

from vllm.config import CompilationConfig

compilation_config = CompilationConfig(
    pass_pipeline=[
        "vllm.compilation.passes.fusion.qk_norm_rope_fusion.QKNormRoPEFusionPass",
    ],
)

---

class PassManager(CustomGraphPass):
    def __init__(
        self, passes: list[tuple[str, InductorPass]] | None = None
    ) -> None:
        self.passes: list[tuple[str, InductorPass]] = list(passes or [])
        self.hook: str = ""
        self._config_hash: str = ""

    def bind_to_hook(self, hook: str) -> "PassManager":
        self.hook = hook
        return self

    @classmethod
    def from_config(
        cls, config: "VllmConfig", hook: str | None = None
    ) -> "PassManager":
        """Build a configured pass manager from ``VllmConfig`` for a hook."""
        pass_pipeline = config.compilation_config.pass_pipeline
        manager = cls()
        if hook is not None:
            manager.bind_to_hook(hook)

        with set_current_vllm_config(config, check_compile=False):
            if pass_pipeline is None:
                pass_pipeline = cls.default_pass_pipeline(config)
            manager._config_hash = cls.compute_config_hash(config, pass_pipeline)
            for item in pass_pipeline:
                manager.passes.append(_resolve_pass(item))

        return manager

    @classmethod
    def compute_config_hash(
        cls, config: "VllmConfig", pass_pipeline: list[InductorPass] | None
    ) -> str:
        if config.compilation_config.pass_pipeline is None:
            return config.compilation_config.pass_config.compute_hash()
        return ""

    @classmethod
    def default_pass_pipeline(cls, config: "VllmConfig") -> list[InductorPass]:
        return []

    def describe_pipeline(self) -> list[str]:
        return [pass_id for pass_id, _ in self.passes]

    def __call__(self, graph: fx.Graph) -> None:
        compile_range = get_pass_context().compile_range
        for _, pass_ in self.passes:
            if pass_.is_applicable_for_range(compile_range):
                pass_(graph)

    def uuid(self) -> str:
        state: dict[str, Any] = {
            "manager": f"{type(self).__module__}.{type(self).__qualname__}",
            "hook": self.hook,
            "config_hash": self._config_hash,
            "compile_range": str(get_pass_context().compile_range),
            "passes": [
                {
                    "pass_id": pass_id,
                    "uuid": pass_.uuid(),
                }
                for pass_id, pass_ in self.passes
            ],
        }
        return InductorPass.hash_dict(state)

---

class PostGradPassManager(PassManager):

    @classmethod
    def compute_config_hash(
        cls, config: VllmConfig, pass_pipeline: list[InductorPass] | None
    ) -> str:
        return config.compilation_config.pass_config.compute_hash()

    @classmethod
    def default_pass_pipeline(cls, config: VllmConfig) -> list[InductorPass]:
        pass_config = config.compilation_config.pass_config
        pass_pipeline: list[InductorPass] = []

        # ... platform-specific fusion passes ...

        if pass_config.fuse_allreduce_rms and current_platform.is_cuda():
            pass_pipeline.append(AllReduceFusionPass)

        if pass_config.fuse_minimax_qk_norm and current_platform.is_cuda():
            pass_pipeline.append(MiniMaxQKNormPass)

        # ... other passes ...

        pass_pipeline.append(PostCleanupPass)
        pass_pipeline.append(VllmIRLoweringPass)
        pass_pipeline.append(PostCleanupPass)
        pass_pipeline.append(FixFunctionalizationPass)
        return pass_pipeline

---

@config
class PassConfig:
    def compute_hash(self) -> str:
        return hash_factors(get_hash_factors(self, set()))

@config
class PostGradPassConfig(PassConfig):
    fuse_norm_quant: bool = None
    fuse_act_quant: bool = None
    fuse_allreduce_rms: bool = None
    # ...

---

@config
class MyOOTPassConfig(PassConfig):
    enable_my_pass: bool = True

---

class MyPlatformPassManager(PassManager):

    @classmethod
    def compute_config_hash(
        cls, config: VllmConfig, pass_pipeline: list[InductorPass] | None
    ) -> str:
        pass_config = config.compilation_config.pass_config
        assert isinstance(pass_config, MyOOTPassConfig)
        return pass_config.compute_hash()

    @classmethod
    def default_pass_pipeline(cls, config: VllmConfig) -> list[InductorPass]:
        pass_config = config.compilation_config.pass_config
        assert isinstance(pass_config, MyOOTPassConfig)
        pipeline = []
        if pass_config.enable_my_pass:
            pipeline.append(MyPass)
        return pipeline
RAW_BUFFERClick to expand / collapse

Motivation.

This RFC is to slove https://github.com/vllm-project/vllm/issues/39356. Currently, all custom torch.compile passes in vLLM are managed by a single PostGradPassManager. Passes are registered by mounting them onto the Inductor hook identified by current_platform.pass_key, and each pass is individually enabled or disabled through boolean fields in PassConfig.

This pattern has several problems:

1. Inflexible Configuration Interface

Every new pass requires a new boolean field in PassConfig. When debugging and you only want to enable one pass, you must explicitly disable all others on the CLI. Some utility passes have no independent flag at all, so users cannot temporarily disable or adjust them without modifying source code.

2. Difficult OOT Platform Extension

For vLLM-Spyre, vLLM forcibly absorbs external passes into PostGradPassManager via an absorb hack in configure_post_pass. OOT backends that define their own PassManager must re-implement shared infrastructure such as pass execution ordering and cache hash computation from scratch.

3. Non-switchable Hook Points

current_platform.pass_key is hardcoded to "post_grad_custom_post_pass", which limits extensibility. The torch.compile compilation pipeline exposes multiple hook points (e.g. pre_grad_custom_pass), but the current implementation allows a platform to attach to only one. (Multi-hook support is out of scope for this RFC.)

4. Unnecessary Coupling Between IR Lowering and VllmConfig

VllmIRLoweringPass replaces IR nodes with concrete operators and has no inherent need for VllmConfig. However, it inherits from VllmInductorPass, which has a hard dependency on VllmConfig. This unnecessary coupling prevents the IR lowering logic from being moved into a standalone vllm/ir/ sub-package, blocking IR modularization.

Proposed Change.

  • Flexible configuration: provide an explicitly configurable pass pipeline that supports enabling, disabling, reordering, and overriding default passes.
  • Extensible base class: provide a platform-agnostic PassManager that encapsulates pass composition, execution, and cache hash computation.
  • Easy OOT extension: OOT platforms can reuse vLLM passes as needed and freely insert or replace custom passes.
  • Decouple IR Lowering Pass: decouple the IR lowering pass from VllmConfig, enabling it to move into a standalone vllm/ir/ module.

1. Support Explicit Pass Pipeline Definition

Users can specify a complete pass list via Python or CLI. When an explicit pass_pipeline is present it fully determines the manager's pass structure; otherwise the manager's own default_pass_pipeline(config) generates the default pipeline.

pass_pipeline supports two forms:

  • A Python InductorPass class
  • A fully-qualified InductorPass class name string
# Form 1: Python class
from vllm.config import CompilationConfig
from vllm.compilation.passes.fusion.qk_norm_rope_fusion import QKNormRoPEFusionPass

compilation_config = CompilationConfig(
    pass_pipeline=[
        QKNormRoPEFusionPass,
    ],
)
from vllm.config import CompilationConfig

compilation_config = CompilationConfig(
    pass_pipeline=[
        "vllm.compilation.passes.fusion.qk_norm_rope_fusion.QKNormRoPEFusionPass",
    ],
)

PassConfig is retained and continues to drive the default PostGradPassManager pipeline.


2. Provide a Generic PassManager Base Class

PassManager encapsulates common infrastructure: hook binding, pass pipeline construction, sequential pass execution, and cache uuid() computation. OOT platforms can subclass PassManager directly, define only their own default pipeline, and avoid re-implementing pass-management infrastructure.

class PassManager(CustomGraphPass):
    def __init__(
        self, passes: list[tuple[str, InductorPass]] | None = None
    ) -> None:
        self.passes: list[tuple[str, InductorPass]] = list(passes or [])
        self.hook: str = ""
        self._config_hash: str = ""

    def bind_to_hook(self, hook: str) -> "PassManager":
        self.hook = hook
        return self

    @classmethod
    def from_config(
        cls, config: "VllmConfig", hook: str | None = None
    ) -> "PassManager":
        """Build a configured pass manager from ``VllmConfig`` for a hook."""
        pass_pipeline = config.compilation_config.pass_pipeline
        manager = cls()
        if hook is not None:
            manager.bind_to_hook(hook)

        with set_current_vllm_config(config, check_compile=False):
            if pass_pipeline is None:
                pass_pipeline = cls.default_pass_pipeline(config)
            manager._config_hash = cls.compute_config_hash(config, pass_pipeline)
            for item in pass_pipeline:
                manager.passes.append(_resolve_pass(item))

        return manager

    @classmethod
    def compute_config_hash(
        cls, config: "VllmConfig", pass_pipeline: list[InductorPass] | None
    ) -> str:
        if config.compilation_config.pass_pipeline is None:
            return config.compilation_config.pass_config.compute_hash()
        return ""

    @classmethod
    def default_pass_pipeline(cls, config: "VllmConfig") -> list[InductorPass]:
        return []

    def describe_pipeline(self) -> list[str]:
        return [pass_id for pass_id, _ in self.passes]

    def __call__(self, graph: fx.Graph) -> None:
        compile_range = get_pass_context().compile_range
        for _, pass_ in self.passes:
            if pass_.is_applicable_for_range(compile_range):
                pass_(graph)

    def uuid(self) -> str:
        state: dict[str, Any] = {
            "manager": f"{type(self).__module__}.{type(self).__qualname__}",
            "hook": self.hook,
            "config_hash": self._config_hash,
            "compile_range": str(get_pass_context().compile_range),
            "passes": [
                {
                    "pass_id": pass_id,
                    "uuid": pass_.uuid(),
                }
                for pass_id, pass_ in self.passes
            ],
        }
        return InductorPass.hash_dict(state)

3. Redefine PostGradPassManager

class PostGradPassManager(PassManager):

    @classmethod
    def compute_config_hash(
        cls, config: VllmConfig, pass_pipeline: list[InductorPass] | None
    ) -> str:
        return config.compilation_config.pass_config.compute_hash()

    @classmethod
    def default_pass_pipeline(cls, config: VllmConfig) -> list[InductorPass]:
        pass_config = config.compilation_config.pass_config
        pass_pipeline: list[InductorPass] = []

        # ... platform-specific fusion passes ...

        if pass_config.fuse_allreduce_rms and current_platform.is_cuda():
            pass_pipeline.append(AllReduceFusionPass)

        if pass_config.fuse_minimax_qk_norm and current_platform.is_cuda():
            pass_pipeline.append(MiniMaxQKNormPass)

        # ... other passes ...

        pass_pipeline.append(PostCleanupPass)
        pass_pipeline.append(VllmIRLoweringPass)
        pass_pipeline.append(PostCleanupPass)
        pass_pipeline.append(FixFunctionalizationPass)
        return pass_pipeline


4. Provide a Generic PassConfig Base Class

@config
class PassConfig:
    def compute_hash(self) -> str:
        return hash_factors(get_hash_factors(self, set()))

@config
class PostGradPassConfig(PassConfig):
    fuse_norm_quant: bool = None
    fuse_act_quant: bool = None
    fuse_allreduce_rms: bool = None
    # ...

note: should define get_pass_config_cls in Platform interface.


5. OOT implement MyPlatformPassManager

@config
class MyOOTPassConfig(PassConfig):
    enable_my_pass: bool = True
class MyPlatformPassManager(PassManager):

    @classmethod
    def compute_config_hash(
        cls, config: VllmConfig, pass_pipeline: list[InductorPass] | None
    ) -> str:
        pass_config = config.compilation_config.pass_config
        assert isinstance(pass_config, MyOOTPassConfig)
        return pass_config.compute_hash()

    @classmethod
    def default_pass_pipeline(cls, config: VllmConfig) -> list[InductorPass]:
        pass_config = config.compilation_config.pass_config
        assert isinstance(pass_config, MyOOTPassConfig)
        pipeline = []
        if pass_config.enable_my_pass:
            pipeline.append(MyPass)
        return pipeline

6. Zero-Argument Pass Construction

Passes should support zero-argument construction by default. Passes that previously required VllmConfig should instead retrieve it from context via get_current_vllm_config().

Feedback Period.

one week

CC List.

@ProExpertProg @wangxiyuan

Any Other Things.

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

To address the issues with the current pass management system, implement a flexible pass pipeline configuration, provide a generic PassManager base class, and decouple the IR lowering pass from VllmConfig.

Guidance

  • Define an explicitly configurable pass pipeline that supports enabling, disabling, reordering, and overriding default passes.
  • Create a platform-agnostic PassManager that encapsulates pass composition, execution, and cache hash computation.
  • Decouple the IR lowering pass from VllmConfig to enable it to move into a standalone vllm/ir/ module.
  • Ensure passes support zero-argument construction by default and retrieve VllmConfig from context via get_current_vllm_config() when needed.

Example

class MyPlatformPassManager(PassManager):
    @classmethod
    def default_pass_pipeline(cls, config: VllmConfig) -> list[InductorPass]:
        pass_config = config.compilation_config.pass_config
        pipeline = []
        if pass_config.enable_my_pass:
            pipeline.append(MyPass)
        return pipeline

Notes

The proposed changes aim to improve the flexibility and extensibility of the pass management system. However, the implementation details may vary depending on the specific requirements and constraints of the project.

Recommendation

Apply the proposed changes to implement a flexible pass pipeline configuration and a generic PassManager base class, as they address the identified issues and improve the overall design of the system.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING