transformers - ✅(Solved) Fix [Bug] Flash Attention crashes with illegal memory access on Qwen3.5 due to 3D position_ids being misinterpreted as packed sequence [2 pull requests, 6 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#44910Fetched 2026-04-08 01:12:36
View on GitHub
Comments
6
Participants
3
Timeline
25
Reactions
1
Timeline (top)
subscribed ×7commented ×6mentioned ×6cross-referenced ×3

When using attn_implementation="flash_attention_2" with Qwen3.5 models, all forward passes crash with CUDA error: an illegal memory access was encountered. This affects both training and inference.

Root cause: Qwen3.5 uses a hybrid architecture (GatedDeltaNet linear attention + standard attention) and passes 3D position_ids with shape [3, batch_size, seq_len] (for multi-dimensional rotary embedding). The function _is_packed_sequence() in modeling_flash_attention_utils.py misinterprets this 3D tensor as a packed sequence indicator, causing cu_seqlens to be constructed with 3× the actual token count. Flash Attention then reads beyond the q/k/v tensor boundaries, resulting in an illegal memory access.

Error Message

torch.AcceleratorError: CUDA error: an illegal memory access was encountered

Root Cause

Root cause: Qwen3.5 uses a hybrid architecture (GatedDeltaNet linear attention + standard attention) and passes 3D position_ids with shape [3, batch_size, seq_len] (for multi-dimensional rotary embedding). The function _is_packed_sequence() in modeling_flash_attention_utils.py misinterprets this 3D tensor as a packed sequence indicator, causing cu_seqlens to be constructed with 3× the actual token count. Flash Attention then reads beyond the q/k/v tensor boundaries, resulting in an illegal memory access.

Fix Action

Fix

Add a dimensionality check in _is_packed_sequence() to reject tensors with more than 2 dimensions, since packed sequences always use 2D position_ids [batch, seq_len]:

def _is_packed_sequence(position_ids, batch_size):
    if position_ids is None:
        return False
    if position_ids.dim() > 2:
        return False
    increasing_position_sequences = (
        torch.arange(position_ids.shape[1], device=position_ids.device) + position_ids.min()
    )
    return batch_size == 1 and (increasing_position_sequences - position_ids).abs().sum().bool()

This fix has been validated: all 8 standard attention layers in Qwen3.5-9B pass flash attention forward successfully after applying the patch.

PR fix notes

PR #44911: Fix flash attention crash with 3D position_ids (Qwen3.5)

Description (problem / solution / changelog)

Qwen3.5 uses 3D position_ids [3, batch, seq_len] for multi-dimensional rotary embedding. _is_packed_sequence() misinterprets this as a packed sequence, causing cu_seqlens to be constructed with 3x the actual token count. Flash attention then reads beyond tensor boundaries, resulting in CUDA illegal memory access.

Add a dimensionality check to reject >2D position_ids, since packed sequences always use 2D [batch, seq_len] format.

What does this PR do?

<!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable -->

Qwen3.5 uses a hybrid architecture (GatedDeltaNet + standard attention) with 3D position_ids of shape [3, batch_size, seq_len] for multi-dimensional rotary embedding. The function _is_packed_sequence() in modeling_flash_attention_utils.py does not handle >2D tensors, causing it to misidentify the input as a packed sequence. This leads to cu_seqlens being constructed with 3× the actual token count, and flash_attn_varlen_func reads beyond tensor boundaries, resulting in CUDA error: illegal memory access.

The fix: Add if position_ids.dim() > 2: return False at the top of _is_packed_sequence(), since packed sequences always use 2D [batch, seq_len] position_ids.

Intercepted evidence before crash:

q: torch.Size([256, 16, 256])           ← 256 tokens
cu_seqlens_q: tensor([0, 256, 512, 768]) ← claims 768 tokens (3×256)
q total=256 vs cu_seqlens_q[-1]=768      ← MISMATCH → illegal memory access

Fixes #44910

Before submitting

Who can review?

@vasqu @ArthurZucker @CyrilVallez (attention)

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**. Please tag fewer than 3 people. Models: - text models: @ArthurZucker @Cyrilvallez - vision models: @yonigozlan @molbap - audio models: @eustlb @ebezzam @vasqu - multimodal models: @zucchini-nlp - graph models: @clefourrier Library: - generate: @zucchini-nlp (visual-language models) or @gante (all others) - continuous batching: @remi-or @ArthurZucker @McPatate - pipelines: @Rocketknight1 - tokenizers: @ArthurZucker and @itazap - trainer: @SunMarc - attention: @vasqu @ArthurZucker @CyrilVallez - model loading (from pretrained, etc): @CyrilVallez - distributed: @3outeille @ArthurZucker - CIs: @ydshieh Integrations: - ray/raytune: @richardliaw, @amogkam - Big Model Inference: @SunMarc - quantization: @SunMarc - kernels: @drbh - peft: @BenjaminBossan @githubnemo Devices/Backends: - AMD ROCm: @ivarflakstad - Intel XPU: @IlyasMoutawwakil - Ascend NPU: @ivarflakstad Documentation: @stevhliu Research projects are not maintained and should be taken as is. -->

Changed files

  • src/transformers/modeling_flash_attention_utils.py (modified, +8/-4)

PR #1487: [multimodal] add language_model_only flag for models like qwen3.5

Description (problem / solution / changelog)

Add language_model_only flag for multimodal models (Qwen3.5)

Summary

  • Add language_model_only config flag across policy, ref, and inference engine configs to skip vision encoder initialization for multimodal models like Qwen3.5, reducing GPU memory usage
  • Fix FSDP weight sync: remap CausalLM param names (model.layers.*) to vLLM's expected namespace (language_model.model.layers.*) via new weight_prefix in FSDPWeightExtractor
  • Make FSDP wrap policy resilient to missing vision-only layer classes (warn + skip instead of crash)
  • Add flash-linear-attention and causal-conv1d dependencies; unblock causal-conv1d install override -- required for performant GDN layer execution
  • Add run_qwen3.5_0.8b.sh example with use_sample_packing=false (GDN layers are incompatible with packing)

Runs

FSDP and megatron reward matching <img width="487" height="257" alt="image" src="https://github.com/user-attachments/assets/efb388d2-52b2-4789-ae88-0d29b93acdff" />

Test plan

  • Run run_qwen3.5_0.8b.sh on 4 GPUs -- verify weight sync, no GDN fallback warnings, avg_final_rewards trends up
  • Run existing non-multimodal FSDP test to confirm no regression
  • Verify config validation rejects mismatched language_model_only across policy/ref/generator
<!-- devin-review-badge-begin -->
<a href="https://app.devin.ai/review/novasky-ai/skyrl/pull/1487" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->

Changed files

  • examples/train/megatron/run_megatron_qwen3.5.sh (modified, +4/-4)
  • examples/train/models/run_qwen3.5_0.8b.sh (added, +68/-0)
  • pyproject.toml (modified, +4/-2)
  • skyrl/backends/skyrl_train/distributed/fsdp_utils.py (modified, +11/-1)
  • skyrl/backends/skyrl_train/inference_engines/ray_wrapped_inference_engine.py (modified, +2/-0)
  • skyrl/backends/skyrl_train/inference_servers/utils.py (modified, +1/-0)
  • skyrl/backends/skyrl_train/workers/fsdp/fsdp_worker.py (modified, +21/-2)
  • skyrl/backends/skyrl_train/workers/model_wrapper.py (modified, +20/-12)
  • skyrl/backends/skyrl_train_backend.py (modified, +1/-0)
  • skyrl/train/config/config.py (modified, +9/-0)
  • skyrl/train/entrypoints/main_base.py (modified, +1/-0)
  • skyrl/train/utils/utils.py (modified, +7/-1)
  • uv.lock (modified, +64/-26)

Code Example

import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.5-9B",  # or any Qwen3.5 variant
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.bfloat16,
    ),
    torch_dtype=torch.bfloat16,
    device_map={"": 0},
    attn_implementation="flash_attention_2",
)

# This crashes immediately
input_ids = torch.randint(100, 5000, (1, 256), device="cuda")
with torch.no_grad():
    out = model(input_ids=input_ids, use_cache=False)

---

torch.AcceleratorError: CUDA error: an illegal memory access was encountered

---

File "transformers/modeling_flash_attention_utils.py", line 692, in _flash_attention_forward
    out = flash_varlen_fn(
File "flash_attn/flash_attn_interface.py", line 1443, in flash_attn_varlen_func
    return FlashAttnVarlenFunc.apply(
File "flash_attn/flash_attn_interface.py", line 165, in _flash_attn_varlen_forward
    out, softmax_lse, S_dmask, rng_state = flash_attn_gpu.varlen_fwd(
torch.AcceleratorError: CUDA error: an illegal memory access was encountered

---

def _is_packed_sequence(position_ids, batch_size):
    if position_ids is None:
        return False
    increasing_position_sequences = (
        torch.arange(position_ids.shape[1], device=position_ids.device) + position_ids.min()
    )
    return batch_size == 1 and (increasing_position_sequences - position_ids).abs().sum().bool()

---

elif is_fa_with_varlen_kwargs or is_fa_with_position_ids:
    if cu_seq_lens_q is None or cu_seq_lens_k is None:
        q, k, v, (cu_seq_lens_q, cu_seq_lens_k), (max_length_q, max_length_k) = _prepare_from_posids(
            query_states, key_states, value_states, position_ids
        )

---

position_ids = position_ids.reshape(-1)  # [3, 1, 256][768]
indices_q = (position_ids == 0).nonzero().view(-1)  # Finds 3 zero positions

---

🔍 varlen_fwd parameters:
  q: torch.Size([256, 16, 256])256 tokens
  cu_seqlens_q: tensor([0, 256, 512, 768]) ← claims 768 tokens
  q total=256 vs cu_seqlens_q[-1]=768MISMATCH → crash

---

def _is_packed_sequence(position_ids, batch_size):
    if position_ids is None:
        return False
    if position_ids.dim() > 2:
        return False
    increasing_position_sequences = (
        torch.arange(position_ids.shape[1], device=position_ids.device) + position_ids.min()
    )
    return batch_size == 1 and (increasing_position_sequences - position_ids).abs().sum().bool()

---

import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.5-9B",
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.bfloat16,
    ),
    torch_dtype=torch.bfloat16,
    device_map={"": 0},
    attn_implementation="flash_attention_2",
)

input_ids = torch.randint(100, 5000, (1, 256), device="cuda")
with torch.no_grad():
    out = model(input_ids=input_ids, use_cache=False)  # crashes here

---

torch.AcceleratorError: CUDA error: an illegal memory access was encountered

---

def _is_packed_sequence(position_ids, batch_size):
    if position_ids is None:
        return False
    increasing_position_sequences = (
        torch.arange(position_ids.shape[1], device=position_ids.device) + position_ids.min()
    )
    return batch_size == 1 and (increasing_position_sequences - position_ids).abs().sum().bool()

---

position_ids = position_ids.reshape(-1)  # [3, 1, 256][768]
indices_q = (position_ids == 0).nonzero()  # finds 3 zero positions
# constructs cu_seqlens = [0, 256, 512, 768] — claims 768 tokens

---

q: torch.Size([256, 16, 256])256 tokens
cu_seqlens_q: tensor([0, 256, 512, 768]) ← claims 768 tokens
q total=256 vs cu_seqlens_q[-1]=768MISMATCH → crash
RAW_BUFFERClick to expand / collapse

System Info

[Bug] Flash Attention crashes with illegal memory access on Qwen3.5 due to 3D position_ids being misinterpreted as packed sequence

We fixed it in https://github.com/ouroborosscr/transformers/tree/fix/qwen35-flash-attn-3d-position-ids

Description

When using attn_implementation="flash_attention_2" with Qwen3.5 models, all forward passes crash with CUDA error: an illegal memory access was encountered. This affects both training and inference.

Root cause: Qwen3.5 uses a hybrid architecture (GatedDeltaNet linear attention + standard attention) and passes 3D position_ids with shape [3, batch_size, seq_len] (for multi-dimensional rotary embedding). The function _is_packed_sequence() in modeling_flash_attention_utils.py misinterprets this 3D tensor as a packed sequence indicator, causing cu_seqlens to be constructed with 3× the actual token count. Flash Attention then reads beyond the q/k/v tensor boundaries, resulting in an illegal memory access.

Reproduction

import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.5-9B",  # or any Qwen3.5 variant
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.bfloat16,
    ),
    torch_dtype=torch.bfloat16,
    device_map={"": 0},
    attn_implementation="flash_attention_2",
)

# This crashes immediately
input_ids = torch.randint(100, 5000, (1, 256), device="cuda")
with torch.no_grad():
    out = model(input_ids=input_ids, use_cache=False)

Error:

torch.AcceleratorError: CUDA error: an illegal memory access was encountered

Traceback (abbreviated):

File "transformers/modeling_flash_attention_utils.py", line 692, in _flash_attention_forward
    out = flash_varlen_fn(
File "flash_attn/flash_attn_interface.py", line 1443, in flash_attn_varlen_func
    return FlashAttnVarlenFunc.apply(
File "flash_attn/flash_attn_interface.py", line 165, in _flash_attn_varlen_forward
    out, softmax_lse, S_dmask, rng_state = flash_attn_gpu.varlen_fwd(
torch.AcceleratorError: CUDA error: an illegal memory access was encountered

Root Cause Analysis

Qwen3.5 hybrid architecture

Qwen3.5 uses a mixed attention architecture: 24 layers of Qwen3_5GatedDeltaNet (linear attention) and 8 layers of Qwen3_5Attention (standard attention, at layers 3, 7, 11, 15, 19, 23, 27, 31). Only the standard attention layers use flash attention.

Qwen3.5 passes 3D position_ids with shape [3, batch_size, seq_len] for its multi-dimensional rotary embedding (3 sets of position indices).

The bug

In modeling_flash_attention_utils.py, the function _is_packed_sequence() (line 444) does not handle tensors with more than 2 dimensions:

def _is_packed_sequence(position_ids, batch_size):
    if position_ids is None:
        return False
    increasing_position_sequences = (
        torch.arange(position_ids.shape[1], device=position_ids.device) + position_ids.min()
    )
    return batch_size == 1 and (increasing_position_sequences - position_ids).abs().sum().bool()

When position_ids has shape [3, 1, 256]:

  • position_ids.shape[1] returns 1 (not 256 as expected for a 2D [batch, seq_len] tensor)
  • The function returns True, misidentifying this as a packed sequence

This triggers the packed-sequence code path at line 677:

elif is_fa_with_varlen_kwargs or is_fa_with_position_ids:
    if cu_seq_lens_q is None or cu_seq_lens_k is None:
        q, k, v, (cu_seq_lens_q, cu_seq_lens_k), (max_length_q, max_length_k) = _prepare_from_posids(
            query_states, key_states, value_states, position_ids
        )

Inside prepare_fa_kwargs_from_position_ids() (line 362):

position_ids = position_ids.reshape(-1)  # [3, 1, 256] → [768]
indices_q = (position_ids == 0).nonzero().view(-1)  # Finds 3 zero positions

This constructs cu_seqlens = [0, 256, 512, 768], claiming 3 sequences with 768 total tokens. But the actual q/k/v tensors only contain 256 tokens. Flash Attention reads up to index 768, causing the illegal memory access.

Intercepted parameters confirming the mismatch

🔍 varlen_fwd parameters:
  q: torch.Size([256, 16, 256])           ← 256 tokens
  cu_seqlens_q: tensor([0, 256, 512, 768]) ← claims 768 tokens
  q total=256 vs cu_seqlens_q[-1]=768      ← MISMATCH → crash

Fix

Add a dimensionality check in _is_packed_sequence() to reject tensors with more than 2 dimensions, since packed sequences always use 2D position_ids [batch, seq_len]:

def _is_packed_sequence(position_ids, batch_size):
    if position_ids is None:
        return False
    if position_ids.dim() > 2:
        return False
    increasing_position_sequences = (
        torch.arange(position_ids.shape[1], device=position_ids.device) + position_ids.min()
    )
    return batch_size == 1 and (increasing_position_sequences - position_ids).abs().sum().bool()

This fix has been validated: all 8 standard attention layers in Qwen3.5-9B pass flash attention forward successfully after applying the patch.

Environment

  • Model: Qwen3.5-9B (hybrid GatedDeltaNet + standard attention)
  • GPU: NVIDIA A100-SXM4-80GB
  • PyTorch: 2.9.0 / 2.10.0 (both affected)
  • Transformers: 5.3.0
  • flash-attn: 2.8.3
  • CUDA: 12.8

Impact

  • Affects all Qwen3.5 variants (and potentially any future model using >2D position_ids)
  • Blocks both training and inference when using flash_attention_2
  • No workaround other than falling back to sdpa or eager attention implementations

Who can help?

@vasqu @ArthurZucker (attention)

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

GRPO reinforcement learning training with Qwen3.5-9B using TRL GRPOTrainer When using attn_implementation="flash_attention_2" with Qwen3.5, all forward passes crash with CUDA illegal memory access. Minimal reproduction:

import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.5-9B",
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.bfloat16,
    ),
    torch_dtype=torch.bfloat16,
    device_map={"": 0},
    attn_implementation="flash_attention_2",
)

input_ids = torch.randint(100, 5000, (1, 256), device="cuda")
with torch.no_grad():
    out = model(input_ids=input_ids, use_cache=False)  # crashes here

Error:

torch.AcceleratorError: CUDA error: an illegal memory access was encountered

Root cause:

Qwen3.5 is a hybrid architecture (24 GatedDeltaNet layers + 8 standard attention layers). It uses 3D position_ids with shape [3, batch_size, seq_len] for multi-dimensional rotary embedding.

_is_packed_sequence() in modeling_flash_attention_utils.py (line 444) does not handle >2D tensors:

def _is_packed_sequence(position_ids, batch_size):
    if position_ids is None:
        return False
    increasing_position_sequences = (
        torch.arange(position_ids.shape[1], device=position_ids.device) + position_ids.min()
    )
    return batch_size == 1 and (increasing_position_sequences - position_ids).abs().sum().bool()

When position_ids has shape [3, 1, 256], position_ids.shape[1] returns 1 instead of the sequence length, and the function returns True, misidentifying this as a packed sequence.

This triggers prepare_fa_kwargs_from_position_ids() which does:

position_ids = position_ids.reshape(-1)  # [3, 1, 256] → [768]
indices_q = (position_ids == 0).nonzero()  # finds 3 zero positions
# constructs cu_seqlens = [0, 256, 512, 768] — claims 768 tokens

But q/k/v only contain 256 tokens. Flash attention reads up to index 768, causing the illegal memory access.

Intercepted evidence:

q: torch.Size([256, 16, 256])           ← 256 tokens
cu_seqlens_q: tensor([0, 256, 512, 768]) ← claims 768 tokens
q total=256 vs cu_seqlens_q[-1]=768      ← MISMATCH → crash

Environment:

  • Model: Qwen3.5-9B
  • GPU: NVIDIA A100-SXM4-80GB
  • PyTorch: 2.9.0 and 2.10.0 (both affected)
  • transformers: 5.3.0
  • flash-attn: 2.8.3
  • CUDA: 12.8

Fix: Add if position_ids.dim() > 2: return False in _is_packed_sequence(). Packed sequences always use 2D [batch, seq_len] position_ids.

Expected behavior

Model forward pass with attn_implementation="flash_attention_2" should complete successfully without CUDA errors. After the fix (adding dimensionality check for >2D position_ids), all 8 standard attention layers in Qwen3.5-9B pass flash attention forward correctly.

extent analysis

Fix Plan

To resolve the issue, you need to modify the _is_packed_sequence() function in modeling_flash_attention_utils.py to correctly handle 3D position_ids tensors.

Here are the steps:

  • Open the modeling_flash_attention_utils.py file.
  • Locate the _is_packed_sequence() function.
  • Add a dimensionality check at the beginning of the function to return False for tensors with more than 2 dimensions.

Example code:

def _is_packed_sequence(position_ids, batch_size):
    if position_ids is None:
        return False
    if position_ids.dim() > 2:  # Add this line to check for >2D tensors
        return False
    increasing_position_sequences = (
        torch.arange(position_ids.shape[1], device=position_ids.device) + position_ids.min()
    )
    return batch_size == 1 and (increasing_position_sequences - position_ids).abs().sum().bool()

Verification

To verify that the fix worked, run the reproduction code again:

import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.5-9B",
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.bfloat16,
    ),
    torch_dtype=torch.bfloat16,
    device_map={"": 0},
    attn_implementation="flash_attention_2",
)

input_ids = torch.randint(100, 5000, (1, 256), device="cuda")
with torch.no_grad():
    out = model(input_ids=input_ids, use_cache=False)

The model forward pass should now complete successfully without CUDA errors.

Extra Tips

  • Make sure to update the modeling_flash_attention_utils.py file in the correct location, depending on your project setup.
  • If you are using a virtual environment, ensure that the updated file is reflected in the environment.
  • This fix assumes that packed sequences always use 2D [batch, seq_len] position_ids. If this assumption is not valid, further modifications may be necessary.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Model forward pass with attn_implementation="flash_attention_2" should complete successfully without CUDA errors. After the fix (adding dimensionality check for >2D position_ids), all 8 standard attention layers in Qwen3.5-9B pass flash attention forward correctly.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING