transformers - 💡(How to fix) Fix Discussion: optional RankSEG-style decoding for Transformers semantic segmentation post-processing

StepCodex · 2026-05-21T10:36:05Z

[transformers] ! RankSEG https://img.shields.io/badge/RankSEG-GitHub-blue?logo=github https://github.com/rankseg/rankseg ! PyPI https://badge.fury.io/py/rankse… [![RankSEG](https://img.shields.io/badge/RankSEG-GitHub-blue?logo=github)](https://github.com/rankseg/rankseg) [![PyPI](https://badge.fury.io/py/rankseg.svg)](https://pypi.org/project/rankseg/) [![Docs](https://readthedocs.org/projects/rankseg/badge/?version=latest)](https://rankseg.readthedocs.io/en/latest/) [![Transformers docs](https://img.shields.io/badge/docs-Transformers%20integration-brightgreen)](https://rankseg.readthedocs.io/en/latest/integrations_transformers.html) [![Notebook](https://img.shields.io/badge/notebook-Transformers-orange)](https://github.com/rankseg/rankseg/blob/main/notebooks/rankseg_with_transformers.ipynb) [![JMLR 2023](https://img.shields.io/badge/JMLR-2023-black)](https://www.jmlr.org/papers/v24/22-0712.html) [![NeurIPS 2025](https://img.shields.io/badge/NeurIPS-2025-black)](https://openreview.net/forum?id=4tRMm1JJhw) Hi Transformers maintainers, I wanted to share a small downstream experiment around RankSEG-style decoding for semantic segmentation. The short version is: if a Transformers processor can expose resized semantic class probabilities before the final `argmax`, then users can try metric-aware post-processing methods such as RankSEG without changing the model, checkpoint, or preprocessing pipeline. This is related to https://github.com/huggingface/transformers/issues/37715, where the discussion is about making the final `argmax` optional and allowing users to access resized class probability maps. I do not want to assume that RankSEG itself belongs in Transformers, but I think it is a useful concrete example of why probability-level semantic segmentation outputs can matter. ## What I Tried RankSEG is a training-free segmentation decoding method. It takes per-class probability maps and returns a hard segmentation mask optimized for an overlap-style metric such as Dice or IoU. The relevant papers are [RankSEG, JMLR 2023](https://www.jmlr.org/papers/v24/22-0712.html) and [RankSEG-RMA, NeurIPS 2025](https://openreview.net/forum?id=4tRMm1JJhw). There is also a [RankSEG repository](https://github.com/rankseg/rankseg), a [PyPI package](https://pypi.org/project/rankseg/), and a [Transformers integration tutorial](https://rankseg.readthedocs.io/en/latest/integrations_transformers.html). The experiment used the usual Transformers inference path first: ```python inputs = processor(images=image, return_tensors="pt") outputs = model(**inputs) ``` Then I compared three post-processing choices using the same `outputs`: ```python # 1. Baseline: standard SegFormer / Transformers argmax-style decoding upsampled_logits = torch.nn.functional.interpolate( outputs.logits, size=target_size, mode="bilinear", align_corners=False, ) baseline = upsampled_logits.argmax(dim=1)[0] # 2. RankSEG optimized for Dice rankseg_dice = rankseg_transformers.postprocess( outputs, model=model, target_sizes=target_sizes, rankseg_kwargs={"metric": "dice", "solver": "RMA", "output_mode": "multiclass"}, ) # 3. RankSEG optimized for IoU rankseg_iou = rankseg_transformers.postprocess( outputs, model=model, target_sizes=target_sizes, rankseg_kwargs={"metric": "iou", "solver": "RMA", "output_mode": "multiclass"}, ) ``` The helper above is already implemented outside Transformers in RankSEG's current compatibility layer: [documentation](https://rankseg.readthedocs.io/en/latest/integrations_transformers.html), [source code](https://github.com/rankseg/rankseg/blob/main/rankseg/integration/transformers.py), [example script](https://github.com/rankseg/rankseg/blob/main/examples/transformers_rankseg.py), and [notebook](https://github.com/rankseg/rankseg/blob/main/notebooks/rankseg_with_transformers.ipynb). The same notebook can be opened in [Colab](https://colab.research.google.com/github/rankseg/rankseg/blob/main/notebooks/rankseg_with_transformers.ipynb). ## Small Cityscapes Check I used `tanganke/cityscapes` only as a lightweight local check because it has a convenient `segmentation_19` ground-truth column. This is not an official Cityscapes benchmark. It is a small smoke test over the first 100 validation images, using samplewise macro Dice and IoU over non-empty classes. | Model | Method | Mean Dice | Dice delta | Mean IoU | IoU delta | | --- | --- | ---: | ---: | ---: | ---: | | `nvidia/segformer-b0-finetuned-cityscapes-512-1024` | Transformers argmax | 0.4608 | - | 0.3898 | - | | `nvidia/segformer-b0-finetuned-cityscapes-512-1024` | RankSEG, `metric="dice"` | 0.4810 | +0.0202 | 0.4045 | +0.0147 | | `nvidia/segformer-b0-finetuned-cityscapes-512-1024` | RankSEG, `metric="iou"` | 0.4813 | +0.0205 | 0.4051 | +0.0153 | | `nvidia/segformer-b1-finetuned-cityscapes-1024-1024` | Transformers argmax | 0.4743 | - | 0.4015 | - | | `nvidia/segformer-b1-finetuned-cityscapes-1024-1024` | RankSEG, `metric="dice"` | 0.4903 | +0.0160 | 0.4128 | +0.0113 | | `nvidia/segformer-b

Code Example

inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)

---

# 1. Baseline: standard SegFormer / Transformers argmax-style decoding
upsampled_logits = torch.nn.functional.interpolate(
    outputs.logits,
    size=target_size,
    mode="bilinear",
    align_corners=False,
)
baseline = upsampled_logits.argmax(dim=1)[0]

# 2. RankSEG optimized for Dice
rankseg_dice = rankseg_transformers.postprocess(
    outputs,
    model=model,
    target_sizes=target_sizes,
    rankseg_kwargs={"metric": "dice", "solver": "RMA", "output_mode": "multiclass"},
)

# 3. RankSEG optimized for IoU
rankseg_iou = rankseg_transformers.postprocess(
    outputs,
    model=model,
    target_sizes=target_sizes,
    rankseg_kwargs={"metric": "iou", "solver": "RMA", "output_mode": "multiclass"},
)

Hi Transformers maintainers,

I wanted to share a small downstream experiment around RankSEG-style decoding for semantic segmentation. The short version is: if a Transformers processor can expose resized semantic class probabilities before the final argmax, then users can try metric-aware post-processing methods such as RankSEG without changing the model, checkpoint, or preprocessing pipeline.

This is related to https://github.com/huggingface/transformers/issues/37715, where the discussion is about making the final argmax optional and allowing users to access resized class probability maps. I do not want to assume that RankSEG itself belongs in Transformers, but I think it is a useful concrete example of why probability-level semantic segmentation outputs can matter.

What I Tried

RankSEG is a training-free segmentation decoding method. It takes per-class probability maps and returns a hard segmentation mask optimized for an overlap-style metric such as Dice or IoU. The relevant papers are RankSEG, JMLR 2023 and RankSEG-RMA, NeurIPS 2025. There is also a RankSEG repository, a PyPI package, and a Transformers integration tutorial.

The experiment used the usual Transformers inference path first:

inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)

Then I compared three post-processing choices using the same outputs:

# 1. Baseline: standard SegFormer / Transformers argmax-style decoding
upsampled_logits = torch.nn.functional.interpolate(
    outputs.logits,
    size=target_size,
    mode="bilinear",
    align_corners=False,
)
baseline = upsampled_logits.argmax(dim=1)[0]

# 2. RankSEG optimized for Dice
rankseg_dice = rankseg_transformers.postprocess(
    outputs,
    model=model,
    target_sizes=target_sizes,
    rankseg_kwargs={"metric": "dice", "solver": "RMA", "output_mode": "multiclass"},
)

# 3. RankSEG optimized for IoU
rankseg_iou = rankseg_transformers.postprocess(
    outputs,
    model=model,
    target_sizes=target_sizes,
    rankseg_kwargs={"metric": "iou", "solver": "RMA", "output_mode": "multiclass"},
)

The helper above is already implemented outside Transformers in RankSEG's current compatibility layer: documentation, source code, example script, and notebook. The same notebook can be opened in Colab.

Small Cityscapes Check

I used tanganke/cityscapes only as a lightweight local check because it has a convenient segmentation_19 ground-truth column. This is not an official Cityscapes benchmark. It is a small smoke test over the first 100 validation images, using samplewise macro Dice and IoU over non-empty classes.

Model	Method	Mean Dice	Dice delta	Mean IoU	IoU delta
`nvidia/segformer-b0-finetuned-cityscapes-512-1024`	Transformers argmax	0.4608	-	0.3898	-
`nvidia/segformer-b0-finetuned-cityscapes-512-1024`	RankSEG, `metric="dice"`	0.4810	+0.0202	0.4045	+0.0147
`nvidia/segformer-b0-finetuned-cityscapes-512-1024`	RankSEG, `metric="iou"`	0.4813	+0.0205	0.4051	+0.0153
`nvidia/segformer-b1-finetuned-cityscapes-1024-1024`	Transformers argmax	0.4743	-	0.4015	-
`nvidia/segformer-b1-finetuned-cityscapes-1024-1024`	RankSEG, `metric="dice"`	0.4903	+0.0160	0.4128	+0.0113
`nvidia/segformer-b1-finetuned-cityscapes-1024-1024`	RankSEG, `metric="iou"`	0.4907	+0.0164	0.4134	+0.0118

The result is modest, but it is consistent with the intended use case: the model is unchanged, and only the final decoding step changes.

Visual Examples

Each image below uses the same layout: baseline argmax on the top left, RankSEG optimized for Dice on the top right, ground-truth overlay on the bottom left, and RankSEG optimized for IoU on the bottom right.

<table> <tr> <td width="50%" align="center"> <a href="https://files.seeusercontent.com/2026/05/21/7wXl/rank_01_sample_0053_ddice_0092_d.png"> <img src="https://files.seeusercontent.com/2026/05/21/7wXl/rank_01_sample_0053_ddice_0092_d.png" alt="SegFormer-B0 Cityscapes example 1" width="100%"> </a> SegFormer-B0, example 1 </td> <td width="50%" align="center"> <a href="https://files.seeusercontent.com/2026/05/21/aSr0/rank_02_sample_0029_ddice_0067_d.png"> <img src="https://files.seeusercontent.com/2026/05/21/aSr0/rank_02_sample_0029_ddice_0067_d.png" alt="SegFormer-B0 Cityscapes example 2" width="100%"> </a> SegFormer-B0, example 2 </td> </tr> <tr> <td width="50%" align="center"> <a href="https://files.seeusercontent.com/2026/05/21/s6kI/rank_01_sample_0072_ddice_0084_d.png"> <img src="https://files.seeusercontent.com/2026/05/21/s6kI/rank_01_sample_0072_ddice_0084_d.png" alt="SegFormer-B1 Cityscapes example 1" width="100%"> </a> SegFormer-B1, example 1 </td> <td width="50%" align="center"> <a href="https://files.seeusercontent.com/2026/05/21/Iyn9/rank_02_sample_0003_ddice_0081_d.png"> <img src="https://files.seeusercontent.com/2026/05/21/Iyn9/rank_02_sample_0003_ddice_0081_d.png" alt="SegFormer-B1 Cityscapes example 2" width="100%"> </a> SegFormer-B1, example 2 </td> </tr> </table>

Why This Relates to Transformers Post-Processing

For simple semantic segmentation heads, restoring probabilities may look like resizing logits and applying softmax. For other model families, the post-processing path can involve class-query logits, mask logits, null classes, model-specific resizing conventions, or processor-owned logic. That is why a probability-returning option inside the existing Transformers post-processing API would be useful: the model-family-specific restoration would stay in the official processor path, while downstream methods could consume the restored probabilities.

Hard segmentation maps could remain the default behavior. The probability path would simply make the intermediate semantic distribution available for downstream decoding, calibration, uncertainty estimation, or metric-aware post-processing such as RankSEG.

Closing

I understand that adding or changing post-processing APIs has maintenance costs, especially in a library used across many model families. I am not asking maintainers to adopt RankSEG directly. I mainly wanted to share a concrete downstream use case showing why resized semantic probability maps could be useful to users who want to experiment beyond argmax.

I would also like to thank @statmlben and @ZixunWang, the RankSEG maintainers and authors of the recent RankSEG-RMA work, for developing and maintaining the RankSEG project that made this small Transformers experiment possible.

If maintainers think this direction is worth exploring, I would be happy to adapt the experiment to a preferred model family, test against a proposed API, or help write documentation/examples in the style that fits Transformers.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - 💡(How to fix) Fix Discussion: optional RankSEG-style decoding for Transformers semantic segmentation post-processing

Recommended Tools

GitHub issue graph ai analysis

Root Cause