transformers - 💡(How to fix) Fix Discussion: optional RankSEG-style decoding for Transformers semantic segmentation post-processing

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

I used tanganke/cityscapes only as a lightweight local check because it has a convenient segmentation_19 ground-truth column. This is not an official Cityscapes benchmark. It is a small smoke test over the first 100 validation images, using samplewise macro Dice and IoU over non-empty classes.

Code Example

inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)

---

# 1. Baseline: standard SegFormer / Transformers argmax-style decoding
upsampled_logits = torch.nn.functional.interpolate(
    outputs.logits,
    size=target_size,
    mode="bilinear",
    align_corners=False,
)
baseline = upsampled_logits.argmax(dim=1)[0]

# 2. RankSEG optimized for Dice
rankseg_dice = rankseg_transformers.postprocess(
    outputs,
    model=model,
    target_sizes=target_sizes,
    rankseg_kwargs={"metric": "dice", "solver": "RMA", "output_mode": "multiclass"},
)

# 3. RankSEG optimized for IoU
rankseg_iou = rankseg_transformers.postprocess(
    outputs,
    model=model,
    target_sizes=target_sizes,
    rankseg_kwargs={"metric": "iou", "solver": "RMA", "output_mode": "multiclass"},
)
RAW_BUFFERClick to expand / collapse

RankSEG PyPI Docs Transformers docs Notebook JMLR 2023 NeurIPS 2025

Hi Transformers maintainers,

I wanted to share a small downstream experiment around RankSEG-style decoding for semantic segmentation. The short version is: if a Transformers processor can expose resized semantic class probabilities before the final argmax, then users can try metric-aware post-processing methods such as RankSEG without changing the model, checkpoint, or preprocessing pipeline.

This is related to https://github.com/huggingface/transformers/issues/37715, where the discussion is about making the final argmax optional and allowing users to access resized class probability maps. I do not want to assume that RankSEG itself belongs in Transformers, but I think it is a useful concrete example of why probability-level semantic segmentation outputs can matter.

What I Tried

RankSEG is a training-free segmentation decoding method. It takes per-class probability maps and returns a hard segmentation mask optimized for an overlap-style metric such as Dice or IoU. The relevant papers are RankSEG, JMLR 2023 and RankSEG-RMA, NeurIPS 2025. There is also a RankSEG repository, a PyPI package, and a Transformers integration tutorial.

The experiment used the usual Transformers inference path first:

inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)

Then I compared three post-processing choices using the same outputs:

# 1. Baseline: standard SegFormer / Transformers argmax-style decoding
upsampled_logits = torch.nn.functional.interpolate(
    outputs.logits,
    size=target_size,
    mode="bilinear",
    align_corners=False,
)
baseline = upsampled_logits.argmax(dim=1)[0]

# 2. RankSEG optimized for Dice
rankseg_dice = rankseg_transformers.postprocess(
    outputs,
    model=model,
    target_sizes=target_sizes,
    rankseg_kwargs={"metric": "dice", "solver": "RMA", "output_mode": "multiclass"},
)

# 3. RankSEG optimized for IoU
rankseg_iou = rankseg_transformers.postprocess(
    outputs,
    model=model,
    target_sizes=target_sizes,
    rankseg_kwargs={"metric": "iou", "solver": "RMA", "output_mode": "multiclass"},
)

The helper above is already implemented outside Transformers in RankSEG's current compatibility layer: documentation, source code, example script, and notebook. The same notebook can be opened in Colab.

Small Cityscapes Check

I used tanganke/cityscapes only as a lightweight local check because it has a convenient segmentation_19 ground-truth column. This is not an official Cityscapes benchmark. It is a small smoke test over the first 100 validation images, using samplewise macro Dice and IoU over non-empty classes.

ModelMethodMean DiceDice deltaMean IoUIoU delta
nvidia/segformer-b0-finetuned-cityscapes-512-1024Transformers argmax0.4608-0.3898-
nvidia/segformer-b0-finetuned-cityscapes-512-1024RankSEG, metric="dice"0.4810+0.02020.4045+0.0147
nvidia/segformer-b0-finetuned-cityscapes-512-1024RankSEG, metric="iou"0.4813+0.02050.4051+0.0153
nvidia/segformer-b1-finetuned-cityscapes-1024-1024Transformers argmax0.4743-0.4015-
nvidia/segformer-b1-finetuned-cityscapes-1024-1024RankSEG, metric="dice"0.4903+0.01600.4128+0.0113
nvidia/segformer-b1-finetuned-cityscapes-1024-1024RankSEG, metric="iou"0.4907+0.01640.4134+0.0118

The result is modest, but it is consistent with the intended use case: the model is unchanged, and only the final decoding step changes.

Visual Examples

Each image below uses the same layout: baseline argmax on the top left, RankSEG optimized for Dice on the top right, ground-truth overlay on the bottom left, and RankSEG optimized for IoU on the bottom right.

<table> <tr> <td width="50%" align="center"> <a href="https://files.seeusercontent.com/2026/05/21/7wXl/rank_01_sample_0053_ddice_0092_d.png"> <img src="https://files.seeusercontent.com/2026/05/21/7wXl/rank_01_sample_0053_ddice_0092_d.png" alt="SegFormer-B0 Cityscapes example 1" width="100%"> </a> <br> <sub>SegFormer-B0, example 1</sub> </td> <td width="50%" align="center"> <a href="https://files.seeusercontent.com/2026/05/21/aSr0/rank_02_sample_0029_ddice_0067_d.png"> <img src="https://files.seeusercontent.com/2026/05/21/aSr0/rank_02_sample_0029_ddice_0067_d.png" alt="SegFormer-B0 Cityscapes example 2" width="100%"> </a> <br> <sub>SegFormer-B0, example 2</sub> </td> </tr> <tr> <td width="50%" align="center"> <a href="https://files.seeusercontent.com/2026/05/21/s6kI/rank_01_sample_0072_ddice_0084_d.png"> <img src="https://files.seeusercontent.com/2026/05/21/s6kI/rank_01_sample_0072_ddice_0084_d.png" alt="SegFormer-B1 Cityscapes example 1" width="100%"> </a> <br> <sub>SegFormer-B1, example 1</sub> </td> <td width="50%" align="center"> <a href="https://files.seeusercontent.com/2026/05/21/Iyn9/rank_02_sample_0003_ddice_0081_d.png"> <img src="https://files.seeusercontent.com/2026/05/21/Iyn9/rank_02_sample_0003_ddice_0081_d.png" alt="SegFormer-B1 Cityscapes example 2" width="100%"> </a> <br> <sub>SegFormer-B1, example 2</sub> </td> </tr> </table>

Why This Relates to Transformers Post-Processing

For simple semantic segmentation heads, restoring probabilities may look like resizing logits and applying softmax. For other model families, the post-processing path can involve class-query logits, mask logits, null classes, model-specific resizing conventions, or processor-owned logic. That is why a probability-returning option inside the existing Transformers post-processing API would be useful: the model-family-specific restoration would stay in the official processor path, while downstream methods could consume the restored probabilities.

Hard segmentation maps could remain the default behavior. The probability path would simply make the intermediate semantic distribution available for downstream decoding, calibration, uncertainty estimation, or metric-aware post-processing such as RankSEG.

Closing

I understand that adding or changing post-processing APIs has maintenance costs, especially in a library used across many model families. I am not asking maintainers to adopt RankSEG directly. I mainly wanted to share a concrete downstream use case showing why resized semantic probability maps could be useful to users who want to experiment beyond argmax.

I would also like to thank @statmlben and @ZixunWang, the RankSEG maintainers and authors of the recent RankSEG-RMA work, for developing and maintaining the RankSEG project that made this small Transformers experiment possible.

If maintainers think this direction is worth exploring, I would be happy to adapt the experiment to a preferred model family, test against a proposed API, or help write documentation/examples in the style that fits Transformers.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING