vllm - ✅(Solved) Fix [RFC]: Support Registry Mechanism for KVCacheSpec [3 pull requests, 1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#36668Fetched 2026-04-08 00:35:29
View on GitHub
Comments
1
Participants
1
Timeline
14
Reactions
0
Participants
Timeline (top)
mentioned ×5subscribed ×5cross-referenced ×2commented ×1

Fix Action

Fix / Workaround

The tight coupling makes it impossible for out-of-tree platforms to extend KV cache behavior without patching vLLM, which is not a long-term solution for evolution.

PR fix notes

PR #31635: Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Compatibility.

Description (problem / solution / changelog)

Purpose

RFC: https://github.com/vllm-project/vllm/issues/31634

This change refactors AttentionSpec in vLLM V1 to allow explicit setting of page_size_bytes. This is required for backends like TPU that use Ragged Paged Attention (RPA), where physical memory padding is necessary for alignment. The current reliance on num_gpu_blocks_override is not only a design "hack" but also breaks multi-host inference using the Ray executor.


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • tests/v1/core/test_kv_sharing.py (modified, +3/-1)
  • tests/v1/core/test_prefix_caching.py (modified, +32/-9)
  • tests/v1/core/test_scheduler.py (modified, +7/-1)
  • tests/v1/core/utils.py (modified, +7/-1)
  • tests/v1/kv_connector/unit/utils.py (modified, +7/-1)
  • vllm/v1/kv_cache_interface.py (modified, +19/-7)

PR #36359: [KV Cache] Use a contiguous buffer for all layers

Description (problem / solution / changelog)

<!-- markdownlint-disable -->

Purpose

Ongoing efforts like https://github.com/vllm-project/vllm/pull/35219 needs extra operations to all layers. make all layers derive from the same tensor so that it can be done with one cuda operation.

CC @tdoublep

Test Plan

Run CI

Test Result


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • vllm/v1/worker/gpu_model_runner.py (modified, +11/-3)

Code Example

class KVCacheSpecRegistry:
    """Global registry mapping KVCacheSpec classes to SingleTypeManager and the rules to define as an uniform type."""

    @classmethod
    def register(cls, spec_class, manager_class, grouping_base_class):
        """
        allows all the platforms, both in-tree and out-of-tree, to register the new KVCacheSpec custom their implements of different KVCacheSpecs
        """

    @classmethod
    def override(cls, spec_class, manager_class, grouping_base_class)
        """
        allows all the platforms to custom their implements of an existed KVCacheSpec
        """
    @classmethod
    def get_manager_class(cls, spec: KVCacheSpec) -> Type[SingleTypeKVCacheManager]
        """
        define the SingleTypeManager to manage the blocks of current KVCacheSpec
        """
    @classmethod
    def get_uniform_type_base_spec(cls, spec: KVCacheSpec) -> Type[KVCacheSpec]
        """
        define the uniform type of KVCacheSpec, this is used for UniformTypeKVCacheSpecs to determine which specs could be treated as the same type, normally it is the KVCacheSpec itself.
        """

---

def register_kv_cache_spec(
    manager_class: Type["SingleTypeKVCacheManager"] | None = None,
    uniform_type_base_spec: Type["KVCacheSpec"] | None = None,
    override: bool = False,
):
    """
    Decorator to register a custom KVCacheSpec class.

    Usage for new specs:
        @register_kv_cache_spec(
            manager_class=FullAttentionManager,
            uniform_type_base_spec=FullAttentionSpec
        )
        @dataclass(frozen=True, kw_only=True)
        class MyCustomFullAttentionSpec(FullAttentionSpec):
            custom_alignment: int = 64

            @property
            def page_size_bytes(self) -> int:
                base_size = super().page_size_bytes
                return ((base_size + self.custom_alignment - 1)
                        // self.custom_alignment * self.custom_alignment)

    Usage for overriding existing specs:
        @register_kv_cache_spec(
            manager_class=IntelOptimizedFullAttentionManager,
            override=True
        )
        class FullAttentionSpec:
            pass  # Just to trigger the decorator

    Args:
        manager_class: The SingleTypeKVCacheManager to use for this spec.
            Required when override=False, optional when override=True.
        uniform_type_base_spec: The base spec class for uniform type kv cache specs compatibility.
            If None, the spec is treated as a new base type (when override=False)
            or keeps existing value (when override=True).
        override: If True, calls override() instead of register(). Use this when
            you want to change the manager for an existing spec without creating
            a new subclass.
    """

    def decorator(spec_class: Type["KVCacheSpec"]) -> Type["KVCacheSpec"]:
        if override:
            # Use override() method for existing specs
            KVCacheSpecRegistry.override(
                spec_class=spec_class,
                manager_class=manager_class,
                uniform_type_base_spec=uniform_type_base_spec,
            )
        else:
            # Use register() method for new specs
            if manager_class is None:
                raise ValueError(
                    "manager_class is required when override=False"
                )
            KVCacheSpecRegistry.register(
                spec_class=spec_class,
                manager_class=manager_class,
                uniform_type_base_spec=uniform_type_base_spec,
            )
        return spec_class

    return decorator

---

@register_kv_cache_spec(
    spec_class=FullAttentionSpec
    manager_class=FullAttentionManager,
    override=True
)
class CustomFullAttentionSpec:
    ...
RAW_BUFFERClick to expand / collapse

Motivation.

  1. Currently the KVCacheSpec is not very decoupling, when adding a new type of KVCacheSpec, we need to do the following things:
  • implement a subclass of KVCacheSpec.
  • implement a SingleTypeKVCacheManager for the KVCacheSpec or specify an existed SingleTypeKVCacheManager for it.
  • map the KVCacheSpec and SingleTypeKVCacheManager in spec_manager_map
  • add a new branch for it in UniformTypeKVCacheSpecs.is_uniform_type
  1. Different alignment sizes or some hardware-customed pads are required by different hardware backends. For example:

The tight coupling makes it impossible for out-of-tree platforms to extend KV cache behavior without patching vLLM, which is not a long-term solution for evolution.

Proposed Change.

please read the details in https://docs.google.com/document/d/1E6dKVeNa-NsmaLo7-0jHJ-So9xgdQfUOOHJ1xW971bQ/edit?usp=sharing

To address the above issues, we attempt to introduce a pluggable mechanism for KVCacheSpec, which is named by KVCacheSpecRegistry currently.

KVCacheSpecRegistry

KVCacheSpecRegistry offers the api for both in-tree and out-of-tree platforms to register the new KVCacheSpec, this, and if a existed KVCacheSpec. The interfaces it offers are illustrated as following:

class KVCacheSpecRegistry:
    """Global registry mapping KVCacheSpec classes to SingleTypeManager and the rules to define as an uniform type."""

    @classmethod
    def register(cls, spec_class, manager_class, grouping_base_class):
        """
        allows all the platforms, both in-tree and out-of-tree, to register the new KVCacheSpec custom their implements of different KVCacheSpecs
        """

    @classmethod
    def override(cls, spec_class, manager_class, grouping_base_class)
        """
        allows all the platforms to custom their implements of an existed KVCacheSpec
        """
    @classmethod
    def get_manager_class(cls, spec: KVCacheSpec) -> Type[SingleTypeKVCacheManager]
        """
        define the SingleTypeManager to manage the blocks of current KVCacheSpec
        """
    @classmethod
    def get_uniform_type_base_spec(cls, spec: KVCacheSpec) -> Type[KVCacheSpec]
        """
        define the uniform type of KVCacheSpec, this is used for UniformTypeKVCacheSpecs to determine which specs could be treated as the same type, normally it is the KVCacheSpec itself.
        """

Registry Decorator

A registry decorator is also required. It could be simply implemented as the following code, and the usage of it is explained in the docstring:

def register_kv_cache_spec(
    manager_class: Type["SingleTypeKVCacheManager"] | None = None,
    uniform_type_base_spec: Type["KVCacheSpec"] | None = None,
    override: bool = False,
):
    """
    Decorator to register a custom KVCacheSpec class.

    Usage for new specs:
        @register_kv_cache_spec(
            manager_class=FullAttentionManager,
            uniform_type_base_spec=FullAttentionSpec
        )
        @dataclass(frozen=True, kw_only=True)
        class MyCustomFullAttentionSpec(FullAttentionSpec):
            custom_alignment: int = 64

            @property
            def page_size_bytes(self) -> int:
                base_size = super().page_size_bytes
                return ((base_size + self.custom_alignment - 1)
                        // self.custom_alignment * self.custom_alignment)

    Usage for overriding existing specs:
        @register_kv_cache_spec(
            manager_class=IntelOptimizedFullAttentionManager,
            override=True
        )
        class FullAttentionSpec:
            pass  # Just to trigger the decorator

    Args:
        manager_class: The SingleTypeKVCacheManager to use for this spec.
            Required when override=False, optional when override=True.
        uniform_type_base_spec: The base spec class for uniform type kv cache specs compatibility.
            If None, the spec is treated as a new base type (when override=False)
            or keeps existing value (when override=True).
        override: If True, calls override() instead of register(). Use this when
            you want to change the manager for an existing spec without creating
            a new subclass.
    """

    def decorator(spec_class: Type["KVCacheSpec"]) -> Type["KVCacheSpec"]:
        if override:
            # Use override() method for existing specs
            KVCacheSpecRegistry.override(
                spec_class=spec_class,
                manager_class=manager_class,
                uniform_type_base_spec=uniform_type_base_spec,
            )
        else:
            # Use register() method for new specs
            if manager_class is None:
                raise ValueError(
                    "manager_class is required when override=False"
                )
            KVCacheSpecRegistry.register(
                spec_class=spec_class,
                manager_class=manager_class,
                uniform_type_base_spec=uniform_type_base_spec,
            )
        return spec_class

    return decorator

Timing for Registry

As the KVCacheSpec should be defined before calling Executor.get_kv_cache_specs, we need to register all the KVCacheSpecs at the beginning of EngineCore._initialize_kv_caches. The whole pipeline of initializing kv caches will goes like this:

<img width="210" height="322" alt="Image" src="https://github.com/user-attachments/assets/b2c71f87-1676-4d91-8880-a42d29966011" />

How to use the Custom KVCacheSpec?

Currently, the KVCacheSpec is defined at class Attention, MLAAttention and MambaBase, etc. We could take advantage of KVCacheSpecRegistry when a type of KVCacheSpec should be globally replaced by a custom one: This case is the most common scenario for registering KVCacheSpec, usually the type of KVCacheSpec is determined by the kv cache design in attention or mamba layer. Thus the platforms most likely to change the pad or the calculation of page_size when enabling different quantization methods, e.g., fp8_ds_mla in MLAAttentionSpec. In this case, we don’t need to change the definition of KVCacheSpec in layers, but only need to override the KVCacheSpec by a custom one, for example:

@register_kv_cache_spec(
    spec_class=FullAttentionSpec
    manager_class=FullAttentionManager,
    override=True
)
class CustomFullAttentionSpec:
    ...

After this pr, when adding a new type of KVCacheSpec, we need to do the following things:

  • implement a subclass of KVCacheSpec.
  • implement a SingleTypeKVCacheManager for the KVCacheSpec or specify an existed SingleTypeKVCacheManager for it.
  • register the customed KVCacheSpec in the platform-specific region

Feedback Period.

No response

CC List.

@heheda12345 @ivanium @vadiklyutiy

Any Other Things.

TODO:

  • Implement the KVCacheSpecRegistry mechanism, check it by register KVCacheSpec in vllm-ascend and commit a pr
  • Offer example and test case for KVCacheSpecRegistry
  • make CacheDtype expandable

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To implement the KVCacheSpecRegistry mechanism, follow these steps:

  • Implement the KVCacheSpecRegistry class with the required methods: register, override, get_manager_class, and get_uniform_type_base_spec.
  • Create a registry decorator register_kv_cache_spec to simplify the registration process.
  • Register custom KVCacheSpec classes using the decorator.
  • Integrate the KVCacheSpecRegistry into the EngineCore._initialize_kv_caches method to register all KVCacheSpecs at the beginning.

Example Code

# Implement the KVCacheSpecRegistry class
class KVCacheSpecRegistry:
    # ...

# Create a registry decorator
def register_kv_cache_spec(
    manager_class: Type["SingleTypeKVCacheManager"] | None = None,
    uniform_type_base_spec: Type["KVCacheSpec"] | None = None,
    override: bool = False,
):
    # ...

# Register a custom KVCacheSpec class
@register_kv_cache_spec(
    manager_class=FullAttentionManager,
    uniform_type_base_spec=FullAttentionSpec
)
class CustomFullAttentionSpec(FullAttentionSpec):
    # Custom implementation

# Integrate the KVCacheSpecRegistry into EngineCore
class EngineCore:
    def _initialize_kv_caches(self):
        # Register all KVCacheSpecs at the beginning
        KVCacheSpecRegistry.register(
            spec_class=CustomFullAttentionSpec,
            manager_class=FullAttentionManager,
            grouping_base_class=FullAttentionSpec
        )
        # ...

Verification

To verify that the fix worked, check the following:

  • The custom KVCacheSpec class is correctly registered using the decorator.
  • The KVCacheSpecRegistry is properly integrated into the EngineCore._initialize_kv_caches method.
  • The custom KVCacheSpec class is used correctly in the application.

Extra Tips

  • Make sure to test the KVCacheSpecRegistry mechanism thoroughly to ensure it works as expected.
  • Provide example use cases and test cases for the KVCacheSpecRegistry to demonstrate its usage and functionality.
  • Consider adding documentation for the KVCacheSpecRegistry and its usage to make it easier for others to understand and use the mechanism.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING