transformers - 💡(How to fix) Fix [New Model] Add Microsoft Samba - Hybrid SSM + Sliding Window Attention

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
RAW_BUFFERClick to expand / collapse

Model Description

Paper: Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling Authors: Liliang Ren, Yang Liu, Yadong Lu, Yelong Shen, Chen Liang, Weizhu Chen Affiliations: Microsoft Research; University of Illinois at Urbana-Champaign Official weights: microsoft/Samba-421M, microsoft/Samba-1.3B, microsoft/Samba-3.8B on HF Hub

Architecture Summary

Samba is a hybrid sequence model that interleaves:

  • Mamba layers (selective SSM for long-range compression, O(n) inference)
  • Sliding Window Attention layers (precise local recall within a window)
  • MLP layers

This achieves unlimited context at O(n) inference cost while retaining exact recall within local context — outperforming pure Mamba and LLaMA at matched compute.

Why add to Transformers

  • Microsoft Research paper with 3 officially released checkpoints (421M/1.3B/3.8B)
  • Users downloading microsoft/Samba-* currently have no Transformers integration for inference pipelines or fine-tuning
  • Complements existing mamba, mamba2, jamba, zamba2, bamba implementations already in the repo
  • Clean modular implementation possible by composing existing Mamba + SWA blocks

Implementation Plan

  • modular_samba.py composing Mamba SSM blocks + Sliding Window Attention
  • configuration_samba.py
  • Weight conversion script for microsoft/Samba-* checkpoints
  • Full test suite + documentation

I am actively working on this implementation. Happy to coordinate with maintainers on design preferences (e.g. modular vs. standalone file structure) before opening a PR.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING