vllm - ✅(Solved) Fix [RFC]: Support Registry Mechanism for KVCacheSpec [3 pull requests, 1 comments, 1 participants]

vllm2026-03-10 15:24:22

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#36668•Fetched 2026-04-08 00:35:29

View on GitHub

Comments

Participants

Timeline

Reactions

Author

MengqingCao

Participants

MengqingCao

Timeline (top)

mentioned ×5subscribed ×5cross-referenced ×2commented ×1

Code Example

class KVCacheSpecRegistry:
    """Global registry mapping KVCacheSpec classes to SingleTypeManager and the rules to define as an uniform type."""

    @classmethod
    def register(cls, spec_class, manager_class, grouping_base_class):
        """
        allows all the platforms, both in-tree and out-of-tree, to register the new KVCacheSpec custom their implements of different KVCacheSpecs
        """

    @classmethod
    def override(cls, spec_class, manager_class, grouping_base_class)
        """
        allows all the platforms to custom their implements of an existed KVCacheSpec
        """
    @classmethod
    def get_manager_class(cls, spec: KVCacheSpec) -> Type[SingleTypeKVCacheManager]
        """
        define the SingleTypeManager to manage the blocks of current KVCacheSpec
        """
    @classmethod
    def get_uniform_type_base_spec(cls, spec: KVCacheSpec) -> Type[KVCacheSpec]
        """
        define the uniform type of KVCacheSpec, this is used for UniformTypeKVCacheSpecs to determine which specs could be treated as the same type, normally it is the KVCacheSpec itself.
        """

---

def register_kv_cache_spec(
    manager_class: Type["SingleTypeKVCacheManager"] | None = None,
    uniform_type_base_spec: Type["KVCacheSpec"] | None = None,
    override: bool = False,
):
    """
    Decorator to register a custom KVCacheSpec class.

    Usage for new specs:
        @register_kv_cache_spec(
            manager_class=FullAttentionManager,
            uniform_type_base_spec=FullAttentionSpec
        )
        @dataclass(frozen=True, kw_only=True)
        class MyCustomFullAttentionSpec(FullAttentionSpec):
            custom_alignment: int = 64

            @property
            def page_size_bytes(self) -> int:
                base_size = super().page_size_bytes
                return ((base_size + self.custom_alignment - 1)
                        // self.custom_alignment * self.custom_alignment)

    Usage for overriding existing specs:
        @register_kv_cache_spec(
            manager_class=IntelOptimizedFullAttentionManager,
            override=True
        )
        class FullAttentionSpec:
            pass  # Just to trigger the decorator

    Args:
        manager_class: The SingleTypeKVCacheManager to use for this spec.
            Required when override=False, optional when override=True.
        uniform_type_base_spec: The base spec class for uniform type kv cache specs compatibility.
            If None, the spec is treated as a new base type (when override=False)
            or keeps existing value (when override=True).
        override: If True, calls override() instead of register(). Use this when
            you want to change the manager for an existing spec without creating
            a new subclass.
    """

    def decorator(spec_class: Type["KVCacheSpec"]) -> Type["KVCacheSpec"]:
        if override:
            # Use override() method for existing specs
            KVCacheSpecRegistry.override(
                spec_class=spec_class,
                manager_class=manager_class,
                uniform_type_base_spec=uniform_type_base_spec,
            )
        else:
            # Use register() method for new specs
            if manager_class is None:
                raise ValueError(
                    "manager_class is required when override=False"
                )
            KVCacheSpecRegistry.register(
                spec_class=spec_class,
                manager_class=manager_class,
                uniform_type_base_spec=uniform_type_base_spec,
            )
        return spec_class

    return decorator

---

@register_kv_cache_spec(
    spec_class=FullAttentionSpec
    manager_class=FullAttentionManager,
    override=True
)
class CustomFullAttentionSpec:
    ...

RAW_BUFFERClick to expand / collapse

Motivation.

Currently the KVCacheSpec is not very decoupling, when adding a new type of KVCacheSpec, we need to do the following things:

implement a subclass of KVCacheSpec.
implement a SingleTypeKVCacheManager for the KVCacheSpec or specify an existed SingleTypeKVCacheManager for it.
map the KVCacheSpec and SingleTypeKVCacheManager in spec_manager_map
add a new branch for it in UniformTypeKVCacheSpecs.is_uniform_type

Different alignment sizes or some hardware-customed pads are required by different hardware backends. For example:

page_size_padded is introduced to ensure compatibility of TPU/RPA in https://github.com/vllm-project/vllm/pull/31635.
GPU need to align with 256 for performance https://github.com/vllm-project/vllm/pull/36359
vLLM-Ascend requires a page size larger than that of MLAAttentionSpec but smaller than that of FullAttentionSpec for deepseek v3.2, but it could only choose the later to make it functionally work.

The tight coupling makes it impossible for out-of-tree platforms to extend KV cache behavior without patching vLLM, which is not a long-term solution for evolution.

Proposed Change.

please read the details in https://docs.google.com/document/d/1E6dKVeNa-NsmaLo7-0jHJ-So9xgdQfUOOHJ1xW971bQ/edit?usp=sharing

To address the above issues, we attempt to introduce a pluggable mechanism for KVCacheSpec, which is named by KVCacheSpecRegistry currently.

KVCacheSpecRegistry

KVCacheSpecRegistry offers the api for both in-tree and out-of-tree platforms to register the new KVCacheSpec, this, and if a existed KVCacheSpec. The interfaces it offers are illustrated as following:

class KVCacheSpecRegistry:
    """Global registry mapping KVCacheSpec classes to SingleTypeManager and the rules to define as an uniform type."""

    @classmethod
    def register(cls, spec_class, manager_class, grouping_base_class):
        """
        allows all the platforms, both in-tree and out-of-tree, to register the new KVCacheSpec custom their implements of different KVCacheSpecs
        """

    @classmethod
    def override(cls, spec_class, manager_class, grouping_base_class)
        """
        allows all the platforms to custom their implements of an existed KVCacheSpec
        """
    @classmethod
    def get_manager_class(cls, spec: KVCacheSpec) -> Type[SingleTypeKVCacheManager]
        """
        define the SingleTypeManager to manage the blocks of current KVCacheSpec
        """
    @classmethod
    def get_uniform_type_base_spec(cls, spec: KVCacheSpec) -> Type[KVCacheSpec]
        """
        define the uniform type of KVCacheSpec, this is used for UniformTypeKVCacheSpecs to determine which specs could be treated as the same type, normally it is the KVCacheSpec itself.
        """

Registry Decorator

A registry decorator is also required. It could be simply implemented as the following code, and the usage of it is explained in the docstring:

def register_kv_cache_spec(
    manager_class: Type["SingleTypeKVCacheManager"] | None = None,
    uniform_type_base_spec: Type["KVCacheSpec"] | None = None,
    override: bool = False,
):
    """
    Decorator to register a custom KVCacheSpec class.

    Usage for new specs:
        @register_kv_cache_spec(
            manager_class=FullAttentionManager,
            uniform_type_base_spec=FullAttentionSpec
        )
        @dataclass(frozen=True, kw_only=True)
        class MyCustomFullAttentionSpec(FullAttentionSpec):
            custom_alignment: int = 64

            @property
            def page_size_bytes(self) -> int:
                base_size = super().page_size_bytes
                return ((base_size + self.custom_alignment - 1)
                        // self.custom_alignment * self.custom_alignment)

    Usage for overriding existing specs:
        @register_kv_cache_spec(
            manager_class=IntelOptimizedFullAttentionManager,
            override=True
        )
        class FullAttentionSpec:
            pass  # Just to trigger the decorator

    Args:
        manager_class: The SingleTypeKVCacheManager to use for this spec.
            Required when override=False, optional when override=True.
        uniform_type_base_spec: The base spec class for uniform type kv cache specs compatibility.
            If None, the spec is treated as a new base type (when override=False)
            or keeps existing value (when override=True).
        override: If True, calls override() instead of register(). Use this when
            you want to change the manager for an existing spec without creating
            a new subclass.
    """

    def decorator(spec_class: Type["KVCacheSpec"]) -> Type["KVCacheSpec"]:
        if override:
            # Use override() method for existing specs
            KVCacheSpecRegistry.override(
                spec_class=spec_class,
                manager_class=manager_class,
                uniform_type_base_spec=uniform_type_base_spec,
            )
        else:
            # Use register() method for new specs
            if manager_class is None:
                raise ValueError(
                    "manager_class is required when override=False"
                )
            KVCacheSpecRegistry.register(
                spec_class=spec_class,
                manager_class=manager_class,
                uniform_type_base_spec=uniform_type_base_spec,
            )
        return spec_class

    return decorator

Timing for Registry

As the KVCacheSpec should be defined before calling Executor.get_kv_cache_specs, we need to register all the KVCacheSpecs at the beginning of EngineCore._initialize_kv_caches. The whole pipeline of initializing kv caches will goes like this:

How to use the Custom `KVCacheSpec`?

Currently, the KVCacheSpec is defined at class Attention, MLAAttention and MambaBase, etc. We could take advantage of KVCacheSpecRegistry when a type of KVCacheSpec should be globally replaced by a custom one: This case is the most common scenario for registering KVCacheSpec, usually the type of KVCacheSpec is determined by the kv cache design in attention or mamba layer. Thus the platforms most likely to change the pad or the calculation of page_size when enabling different quantization methods, e.g., fp8_ds_mla in MLAAttentionSpec. In this case, we don’t need to change the definition of KVCacheSpec in layers, but only need to override the KVCacheSpec by a custom one, for example:

@register_kv_cache_spec(
    spec_class=FullAttentionSpec
    manager_class=FullAttentionManager,
    override=True
)
class CustomFullAttentionSpec:
    ...

After this pr, when adding a new type of KVCacheSpec, we need to do the following things:

implement a subclass of KVCacheSpec.
implement a SingleTypeKVCacheManager for the KVCacheSpec or specify an existed SingleTypeKVCacheManager for it.
register the customed KVCacheSpec in the platform-specific region

Feedback Period.

No response

CC List.

@heheda12345 @ivanium @vadiklyutiy

Any Other Things.

TODO:

Implement the KVCacheSpecRegistry mechanism, check it by register KVCacheSpec in vllm-ascend and commit a pr
Offer example and test case for KVCacheSpecRegistry
make CacheDtype expandable

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To implement the KVCacheSpecRegistry mechanism, follow these steps:

Implement the KVCacheSpecRegistry class with the required methods: register, override, get_manager_class, and get_uniform_type_base_spec.
Create a registry decorator register_kv_cache_spec to simplify the registration process.
Register custom KVCacheSpec classes using the decorator.
Integrate the KVCacheSpecRegistry into the EngineCore._initialize_kv_caches method to register all KVCacheSpecs at the beginning.

Example Code

# Implement the KVCacheSpecRegistry class
class KVCacheSpecRegistry:
    # ...

# Create a registry decorator
def register_kv_cache_spec(
    manager_class: Type["SingleTypeKVCacheManager"] | None = None,
    uniform_type_base_spec: Type["KVCacheSpec"] | None = None,
    override: bool = False,
):
    # ...

# Register a custom KVCacheSpec class
@register_kv_cache_spec(
    manager_class=FullAttentionManager,
    uniform_type_base_spec=FullAttentionSpec
)
class CustomFullAttentionSpec(FullAttentionSpec):
    # Custom implementation

# Integrate the KVCacheSpecRegistry into EngineCore
class EngineCore:
    def _initialize_kv_caches(self):
        # Register all KVCacheSpecs at the beginning
        KVCacheSpecRegistry.register(
            spec_class=CustomFullAttentionSpec,
            manager_class=FullAttentionManager,
            grouping_base_class=FullAttentionSpec
        )
        # ...

Verification

To verify that the fix worked, check the following:

The custom KVCacheSpec class is correctly registered using the decorator.
The KVCacheSpecRegistry is properly integrated into the EngineCore._initialize_kv_caches method.
The custom KVCacheSpec class is used correctly in the application.

Extra Tips

Make sure to test the KVCacheSpecRegistry mechanism thoroughly to ensure it works as expected.
Provide example use cases and test cases for the KVCacheSpecRegistry to demonstrate its usage and functionality.
Consider adding documentation for the KVCacheSpecRegistry and its usage to make it easier for others to understand and use the mechanism.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #cache issue #memory leak #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

vllm - ✅(Solved) Fix [RFC]: Support Registry Mechanism for KVCacheSpec [3 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

PR fix notes

PR #31635: Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Compatibility.

Description (problem / solution / changelog)

Purpose

Changed files

PR #36359: [KV Cache] Use a contiguous buffer for all layers