vllm - ✅(Solved) Fix [Performance]: Flashinfer TRTLLM MoE for Qwen3.5 [2 pull requests, 1 comments, 2 participants]

Linda-Stadter · 2026-03-12T19:48:48Z

[vllm] PR 2640: fix: autotuner cache key mismatch for trtllm-gen FP8 block scale MoE and FP8 routed MoE - Repository: flashinfer-ai/flashinfer - Author: Linda-… # PR #2640: fix: autotuner cache key mismatch for trtllm-gen FP8 block scale MoE and FP8 routed MoE - Repository: flashinfer-ai/flashinfer - Author: Linda-Stadter - State: open | merged: False - Link: https://github.com/flashinfer-ai/flashinfer/pull/2640 ## Description (problem / solution / changelog) ## 📌 Description The PR - fixes input shape mismatches to match the autotuner cache key for MoE FP8 - enables autotuner for fp8 block scale routed moe **Issue1**: Could not find tuned tactic for trtllm_fp8_block_scale_moe ```2026-02-26 09:26:35,204 - INFO - autotuner.py:444 - flashinfer.jit: [AutoTunner]: Using fallback tactic for flashinfer::trtllm_fp8_block_scale_moe with input shapes (torch.Size([1024, 4096]), torch.Size([1024, 512]), torch.Size([0]), torch.Size([0]), torch.Size([1024, 4096]), torch.Size([32, 1024]))``` Tuned with incorrect input: op=flashinfer::trtllm_fp8_block_scale_moe, profile=((1024, 4096), (1024, 512), (**1024**,), (**1024**,), (1024, 4096), (**1024, 16384**)) -> runner_id=0, tactic=[64, 5] **Issue2**: Crash when autotuning trtllm_fp8_block_scale_routed_moe ``` trtllm_fp8_block_scale_routed_moe( File "/flashinfer/flashinfer/fused_moe/core.py", line 2568, in trtllm_fp8_block_scale_routed_moe result = get_trtllm_moe_sm100_module().trtllm_fp8_block_scale_moe( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/flashinfer/flashinfer/fused_moe/core.py", line 1711, in trtllm_fp8_block_scale_moe_op _, tactic = tuner.choose_one( ^^^^^^^^^^^^^^^^^ File "/flashinfer/flashinfer/autotuner.py", line 470, in choose_one tensors = self._prepare_input_tensors(p, inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/flashinfer/flashinfer/autotuner.py", line 792, in _prepare_input_tensors tensor = self._create_tensor_like( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/flashinfer/flashinfer/autotuner.py", line 771, in _create_tensor_like dtype = origin_tensor.dtype ^^^^^^^^^^^^^^^^^^^ AttributeError: 'NoneType' object has no attribute 'dtype' ``` **Benchmark**: | Tokens | BF16 (ms) | BF16 TFLOPS | FP8 Untuned (ms) | FP8 Untuned TFLOPS | FP8 Tuned (ms) | FP8 Tuned TFLOPS | FP8 routed Untuned (ms) | FP8 routed Untuned TFLOPS | FP8 routed Tuned (ms) | FP8 routed Tuned TFLOPS | |--------|-----------|-------------|------------------|-------------------|-----------------|------------------|---------------------|----------------------|------------------|-------------------| | 1024 | 1.877 | 137.32 | 1.455 | 177.07 | 1.187 | 217.07 | 1.337 | 192.80 | 1.514| 170.27 | | 2048 | 1.952 | 263.99 | 1.692 | 304.65 | 1.425 | 361.77 | 1.548 | 333.04 | 1.662 | 310.09| | 4096 | 2.194 | 469.85 | 2.232 | 461.79 | 2.561 | 402.43 | 2.087 | 493.88 | 1.887 | 546.16| | 8192 | 3.594 | 573.57 | 3.458 | 596.15 | 3.439 | 599.49 | 3.355 | 614.50 | 3.582 | 575.53| | 16384 | 5.423 | 760.37 | 6.329 | 651.47 | 5.852 | 704.53 | 6.026 | 684.17 | 5.670 | 727.18| ## 🔍 Related Issues ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes ## Summary by CodeRabbit * **Bug Fixes** * Corrected a typo in autotuner debug log messages. * **Refactor** * Consolidated MoE tuning configuration and input preparation into a centralized setup, simplifying FP8/FP4 paths, reducing duplication, and improving runtime/shape validation and configurability. * **Tests** * Added tests verifying autotuner cache-key behavior across quantization modes and multiple token-count scenarios. ## Changed files - `flashinfer/autotuner.py` (modified, +2/-2) - `flashinfer/fused_moe/core.py` (modified, +216/-104) - `tests/moe/test_moe_autotuner_cache_keys.py` (added, +149/-0) --- # PR #2594: Bf16 routed moe - Repository: flashinfer-ai/flashinfer - Author: IwakuraRein - State: closed | merged: True - Link: https://github.com/flashinfer-ai/flashinfer/pull/2594 ## Description (problem / solution / changelog) ## 📌 Description Add `trtllm_bf16_routed_moe` api ##

vllm2026-03-12 19:48:48

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#36922•Fetched 2026-04-08 00:43:35

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Linda-Stadter

Participants

Linda-Stadter

robertgshaw2-redhat

Timeline (top)

subscribed ×5added_to_project_v2 ×1commented ×1labeled ×1

PR fix notes

PR #2640: fix: autotuner cache key mismatch for trtllm-gen FP8 block scale MoE and FP8 routed MoE

Repository: flashinfer-ai/flashinfer
Author: Linda-Stadter
State: open | merged: False
Link: https://github.com/flashinfer-ai/flashinfer/pull/2640

Description (problem / solution / changelog)

📌 Description

The PR

fixes input shape mismatches to match the autotuner cache key for MoE FP8
enables autotuner for fp8 block scale routed moe

Issue1: Could not find tuned tactic for trtllm_fp8_block_scale_moe 2026-02-26 09:26:35,204 - INFO - autotuner.py:444 - flashinfer.jit: [AutoTunner]: Using fallback tactic for flashinfer::trtllm_fp8_block_scale_moe with input shapes (torch.Size([1024, 4096]), torch.Size([1024, 512]), torch.Size([0]), torch.Size([0]), torch.Size([1024, 4096]), torch.Size([32, 1024]))

Tuned with incorrect input: op=flashinfer::trtllm_fp8_block_scale_moe, profile=((1024, 4096), (1024, 512), (1024,), (1024,), (1024, 4096), (1024, 16384)) -> runner_id=0, tactic=[64, 5]

Issue2: Crash when autotuning trtllm_fp8_block_scale_routed_moe

  File "/flashinfer/flashinfer/fused_moe/core.py", line 2568, in trtllm_fp8_block_scale_routed_moe
    result = get_trtllm_moe_sm100_module().trtllm_fp8_block_scale_moe(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/flashinfer/flashinfer/fused_moe/core.py", line 1711, in trtllm_fp8_block_scale_moe_op
    _, tactic = tuner.choose_one(
                ^^^^^^^^^^^^^^^^^
  File "/flashinfer/flashinfer/autotuner.py", line 470, in choose_one
    tensors = self._prepare_input_tensors(p, inputs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/flashinfer/flashinfer/autotuner.py", line 792, in _prepare_input_tensors
    tensor = self._create_tensor_like(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/flashinfer/flashinfer/autotuner.py", line 771, in _create_tensor_like
    dtype = origin_tensor.dtype
            ^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'dtype'

Benchmark:

Tokens	BF16 (ms)	BF16 TFLOPS	FP8 Untuned (ms)	FP8 Untuned TFLOPS	FP8 Tuned (ms)	FP8 Tuned TFLOPS	FP8 routed Untuned (ms)	FP8 routed Untuned TFLOPS	FP8 routed Tuned (ms)	FP8 routed Tuned TFLOPS
1024	1.877	137.32	1.455	177.07	1.187	217.07	1.337	192.80	1.514	170.27
2048	1.952	263.99	1.692	304.65	1.425	361.77	1.548	333.04	1.662	310.09
4096	2.194	469.85	2.232	461.79	2.561	402.43	2.087	493.88	1.887	546.16
8192	3.594	573.57	3.458	596.15	3.439	599.49	3.355	614.50	3.582	575.53
16384	5.423	760.37	6.329	651.47	5.852	704.53	6.026	684.17	5.670	727.18

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

Bug Fixes
- Corrected a typo in autotuner debug log messages.
Refactor
- Consolidated MoE tuning configuration and input preparation into a centralized setup, simplifying FP8/FP4 paths, reducing duplication, and improving runtime/shape validation and configurability.
Tests
- Added tests verifying autotuner cache-key behavior across quantization modes and multiple token-count scenarios.

Changed files

flashinfer/autotuner.py (modified, +2/-2)
flashinfer/fused_moe/core.py (modified, +216/-104)
tests/moe/test_moe_autotuner_cache_keys.py (added, +149/-0)

PR #2594: Bf16 routed moe

Repository: flashinfer-ai/flashinfer
Author: IwakuraRein
State: closed | merged: True
Link: https://github.com/flashinfer-ai/flashinfer/pull/2594

Description (problem / solution / changelog)

📌 Description

Add trtllm_bf16_routed_moe api

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

pytest tests/moe/test_trtllm_gen_routed_fused_moe.py::test_trtllm_gen_bf16_routed_fused_moe

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

New Features
- Added support for pre-computed routing in MoE operations, enabling flexible routing input strategies.
- New routed MoE APIs now available: BF16 and FP8 variants support pre-packed top-k routing information.
- Introduced dual-path mechanism allowing MoE operations to accept either routing logits or pre-computed routing data.

Changed files

csrc/trtllm_fused_moe_kernel_launcher.cu (modified, +50/-17)
flashinfer/__init__.py (modified, +3/-0)
flashinfer/fused_moe/__init__.py (modified, +2/-0)
flashinfer/fused_moe/core.py (modified, +125/-8)
tests/moe/test_trtllm_gen_routed_fused_moe.py (modified, +147/-3)

Code Example

================================================================================================================================================================
BF16 vs FP8 vs NVFP4 Comparison
================================================================================================================================================================
  EP |     Tokens |  BF16 (ms) |  BF16 TFLOPS |   FP8 (ms) |   FP8 TFLOPS |   NVFP4 (ms) |   NVFP4 TFLOPS |       Best
----------------------------------------------------------------------------------------------------------------------------------------------------------------
   1 |       1024 |      1.961 |       131.41 |      1.349 |       191.06 |        0.631 |         408.45 | NVFP4 3.11x
   1 |       2048 |      2.152 |       239.53 |      1.596 |       323.02 |        0.737 |         698.90 | NVFP4 2.92x
   1 |       4096 |      2.375 |       434.05 |      2.083 |       494.87 |        0.907 |        1135.99 | NVFP4 2.62x
   1 |       8192 |      3.353 |       614.89 |      3.496 |       589.67 |        1.220 |        1690.18 | NVFP4 2.75x
   1 |      16384 |      5.470 |       753.85 |      6.223 |       662.56 |        1.857 |        2220.73 | NVFP4 2.95x
   2 |       1024 |      0.963 |       133.83 |      0.600 |       214.68 |        0.326 |         395.48 | NVFP4 2.95x
   2 |       2048 |      0.995 |       258.89 |      0.692 |       372.17 |        0.363 |         710.77 | NVFP4 2.75x
   2 |       4096 |      1.220 |       422.52 |      1.133 |       455.07 |        0.411 |        1255.06 | NVFP4 2.97x
   2 |       8192 |      1.711 |       602.35 |      1.807 |       570.29 |        0.662 |        1557.88 | NVFP4 2.59x
   2 |      16384 |      2.858 |       721.30 |      3.267 |       631.09 |        0.998 |        2066.61 | NVFP4 2.87x
   4 |       1024 |      0.510 |       126.32 |      0.354 |       181.88 |        0.221 |         291.26 | NVFP4 2.31x
   4 |       2048 |      0.535 |       240.97 |      0.412 |       312.72 |        0.230 |         561.35 | NVFP4 2.33x
   4 |       4096 |      0.639 |       403.22 |      0.653 |       394.37 |        0.243 |        1062.55 | NVFP4 2.64x
   4 |       8192 |      0.903 |       570.57 |      1.057 |       487.62 |        0.375 |        1374.77 | NVFP4 2.41x
   4 |      16384 |      1.509 |       683.32 |      1.950 |       528.67 |        0.576 |        1790.76 | NVFP4 2.62x
   8 |       1024 |      0.289 |       111.52 |      0.227 |       141.73 |        0.216 |         149.08 | NVFP4 1.34x
   8 |       2048 |      0.304 |       211.76 |      0.270 |       238.23 |        0.216 |         297.66 | NVFP4 1.41x
   8 |       4096 |      0.357 |       360.49 |      0.425 |       302.95 |        0.212 |         607.81 | NVFP4 1.69x
   8 |       8192 |      0.506 |       509.30 |      0.707 |       364.62 |        0.248 |        1037.70 | NVFP4 2.04x
   8 |      16384 |      0.846 |       609.23 |      1.310 |       393.48 |        0.393 |        1310.56 | NVFP4 2.15x
----------------------------------------------------------------------------------------------------------------------------------------------------------------

---

The output of `python collect_env.py`

RAW_BUFFERClick to expand / collapse

Proposal to improve performance

I noticed the following issues with respect to performance of Qwen3.5 Moe configurations

FlashInfer TRTLLM routed MoE FP4 has disabled autotuning https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/fused_moe/trtllm_moe.py#L178
FlashInfer TRTLLM routed MoE FP8 has disabled autotuning (could be enabled after PR https://github.com/flashinfer-ai/flashinfer/pull/2640)
Autotuning of Flashinfer trtllm block scale fp4 with Qwen 3.5 sizes produces zero cached tuning results. Fallback strategy is selected.
Always selects fallback strategy instead of cached tuning result for Flashinfer TRTLLM BF16 MoE (due to recent change in main https://github.com/flashinfer-ai/flashinfer/pull/2594)
Always selects fallback strategy instead of cached tuning result for Flashinfer TRTLLM FP8 MoE (PR open https://github.com/flashinfer-ai/flashinfer/pull/2640)
Flashinfer TRTLLM BF16 MoE faster than block scale FP8 for larger number of tokens or higher EP. Could try MXFP8 MoE for Qwen 3.5.

Benchmark on 1x NVIDIA Blackwell B200 with Qwen3.5 configuration:

num_experts = 512
topk = 10
intermediate_size = 1024
hidden_size = 4096
routing_method_type = RenormalizeNaive (Softmax -> TopK -> Renormalize)

================================================================================================================================================================
BF16 vs FP8 vs NVFP4 Comparison
================================================================================================================================================================
  EP |     Tokens |  BF16 (ms) |  BF16 TFLOPS |   FP8 (ms) |   FP8 TFLOPS |   NVFP4 (ms) |   NVFP4 TFLOPS |       Best
----------------------------------------------------------------------------------------------------------------------------------------------------------------
   1 |       1024 |      1.961 |       131.41 |      1.349 |       191.06 |        0.631 |         408.45 | NVFP4 3.11x
   1 |       2048 |      2.152 |       239.53 |      1.596 |       323.02 |        0.737 |         698.90 | NVFP4 2.92x
   1 |       4096 |      2.375 |       434.05 |      2.083 |       494.87 |        0.907 |        1135.99 | NVFP4 2.62x
   1 |       8192 |      3.353 |       614.89 |      3.496 |       589.67 |        1.220 |        1690.18 | NVFP4 2.75x
   1 |      16384 |      5.470 |       753.85 |      6.223 |       662.56 |        1.857 |        2220.73 | NVFP4 2.95x
   2 |       1024 |      0.963 |       133.83 |      0.600 |       214.68 |        0.326 |         395.48 | NVFP4 2.95x
   2 |       2048 |      0.995 |       258.89 |      0.692 |       372.17 |        0.363 |         710.77 | NVFP4 2.75x
   2 |       4096 |      1.220 |       422.52 |      1.133 |       455.07 |        0.411 |        1255.06 | NVFP4 2.97x
   2 |       8192 |      1.711 |       602.35 |      1.807 |       570.29 |        0.662 |        1557.88 | NVFP4 2.59x
   2 |      16384 |      2.858 |       721.30 |      3.267 |       631.09 |        0.998 |        2066.61 | NVFP4 2.87x
   4 |       1024 |      0.510 |       126.32 |      0.354 |       181.88 |        0.221 |         291.26 | NVFP4 2.31x
   4 |       2048 |      0.535 |       240.97 |      0.412 |       312.72 |        0.230 |         561.35 | NVFP4 2.33x
   4 |       4096 |      0.639 |       403.22 |      0.653 |       394.37 |        0.243 |        1062.55 | NVFP4 2.64x
   4 |       8192 |      0.903 |       570.57 |      1.057 |       487.62 |        0.375 |        1374.77 | NVFP4 2.41x
   4 |      16384 |      1.509 |       683.32 |      1.950 |       528.67 |        0.576 |        1790.76 | NVFP4 2.62x
   8 |       1024 |      0.289 |       111.52 |      0.227 |       141.73 |        0.216 |         149.08 | NVFP4 1.34x
   8 |       2048 |      0.304 |       211.76 |      0.270 |       238.23 |        0.216 |         297.66 | NVFP4 1.41x
   8 |       4096 |      0.357 |       360.49 |      0.425 |       302.95 |        0.212 |         607.81 | NVFP4 1.69x
   8 |       8192 |      0.506 |       509.30 |      0.707 |       364.62 |        0.248 |        1037.70 | NVFP4 2.04x
   8 |      16384 |      0.846 |       609.23 |      1.310 |       393.48 |        0.393 |        1310.56 | NVFP4 2.15x
----------------------------------------------------------------------------------------------------------------------------------------------------------------

CC @vadiklyutiy

Report of performance regression

No response

Misc discussion on performance

No response

Your current environment (if you think it is necessary)

The output of `python collect_env.py`

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To address the performance issues with Qwen3.5 Moe configurations, we will focus on enabling autotuning for FlashInfer TRTLLM routed MoE FP4 and FP8, and improving the caching of tuning results.

Step 1: Enable Autotuning for FlashInfer TRTLLM Routed MoE FP4

Enable autotuning by modifying the trtllm_moe.py file:

# vllm/model_executor/layers/fused_moe/trtllm_moe.py
class TRTLLMMoE(nn.Module):
    def __init__(self, ...):
        ...
        self.autotuning_enabled = True  # Enable autotuning

Step 2: Enable Autotuning for FlashInfer TRTLLM Routed MoE FP8

Merge the PR https://github.com/flashinfer-ai/flashinfer/pull/2640 to enable autotuning for FP8.

Step 3: Improve Caching of Tuning Results

Modify the caching mechanism to store and retrieve tuning results correctly:

# flashinfer/model_executor/layers/fused_moe/trtllm_moe.py
class TRTLLMMoE(nn.Module):
    def __init__(self, ...):
        ...
        self.cache = {}  # Initialize an empty cache

    def forward(self, ...):
        ...
        # Cache tuning results
        self.cache[(input_shape, num_experts)] = tuning_result
        ...

Step 4: Use MXFP8 MoE for Qwen 3.5

Consider using MXFP8 MoE for Qwen 3.5 configurations to improve performance.

Verification

Verify the fixes by running benchmarks and checking the performance improvements:

python benchmark.py --config qwen3.5 --num_experts 512 --topk 10 --intermediate_size 1024 --hidden_size 4096 --routing_method_type RenormalizeNaive

Check the output for improved performance metrics, such as reduced latency and increased TFLOPS.

Extra Tips

Monitor the caching mechanism to ensure it is working correctly and not causing performance regressions.
Consider implementing a fallback strategy to handle cases where autotuning is disabled or caching fails.
Keep an eye on upcoming PRs and updates to FlashInfer and VLLM to ensure compatibility and optimal performance.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #GPU setup #container setup #orchestration issue #cache issue #memory leak

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

vllm - ✅(Solved) Fix [Performance]: Flashinfer TRTLLM MoE for Qwen3.5 [2 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #2640: fix: autotuner cache key mismatch for trtllm-gen FP8 block scale MoE and FP8 routed MoE

Description (problem / solution / changelog)

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Changed files

PR #2594: Bf16 routed moe

Description (problem / solution / changelog)

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Changed files

Code Example

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

extent analysis

Fix Plan

Step 1: Enable Autotuning for FlashInfer TRTLLM Routed MoE FP4

Step 2: Enable Autotuning for FlashInfer TRTLLM Routed MoE FP8

Step 3: Improve Caching of Tuning Results

Step 4: Use MXFP8 MoE for Qwen 3.5

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING