pytorch - 💡(How to fix) Fix Problem with mask

Hi,

I’m working on an industrial bin picking system using TorchVision Mask R-CNN and I’m facing a problem with inconsistent and incomplete segmentation masks, even for identical objects.

Problem description: I am detecting flat metal parts (thin sheet metal) in a bin picking scenario. The objects are identical, often overlapping, sometimes partially occluded, and have a reflective surface.

The model performs well in terms of detection (high confidence scores), but the predicted masks are often incomplete (only the visible part of the object), inconsistent between frames, and sometimes contain noise (small false positives / debris).

This creates a major issue: I am unable to extract a stable and repeatable grasp point because the mask shape changes every time.

Expected behavior: For industrial use, I need consistent mask shapes for identical objects, preferably full object segmentation (even when partially occluded, if possible), and stable geometry for downstream processing (grasping).

Current behavior: Mask R-CNN predicts only the visible parts of objects. For partially occluded items, the mask is incomplete. Confidence remains high (e.g. 0.95–1.0), even for poor-quality masks. Small irrelevant regions are sometimes detected as valid objects.

Setup: Model: maskrcnn_resnet50_fpn_v2 Framework: TorchVision Input: 1920x1080 (letterboxed) Dataset: custom (COCO format) Objects: flat metal parts (bin picking) Training: standard TorchVision pipeline

Questions: 1. Is Mask R-CNN in TorchVision expected to always segment only the visible part of an object (no amodal segmentation)? 2. What techniques can improve mask completeness and consistency? 3. Would increasing mask resolution (e.g. from 28×28 to 56×56) help in practice? 4. Are there recommended ways to enforce shape consistency and reduce noise / false positives?

Additional context: This is a real industrial application (robot bin picking), so consistency is more important than raw detection accuracy. I need repeatable geometry, not just object detection.

TorchVision currently performs best among the tested solutions, but this issue is blocking further system development.

Thanks in advance for any guidance.

extent analysis

TL;DR

Improving mask completeness and consistency in TorchVision's Mask R-CNN may require techniques such as increasing mask resolution, implementing amodal segmentation, or enforcing shape consistency through post-processing.

Guidance

Investigate amodal segmentation techniques to predict full object segmentation, even when partially occluded, as Mask R-CNN in TorchVision is expected to segment only the visible part of an object by default.
Consider increasing the mask resolution (e.g., from 28×28 to 56×56) to potentially improve mask completeness and reduce noise/false positives, although the effectiveness of this approach may depend on the specific use case and dataset.
Explore post-processing techniques to enforce shape consistency and reduce noise/false positives, such as morphological operations or shape-based filtering, to refine the predicted masks and improve downstream processing.
Evaluate the trade-off between detection accuracy and mask quality, as the current model prioritizes high confidence scores over mask completeness, and adjust the training pipeline or model parameters accordingly to balance these competing objectives.

Example

No specific code snippet can be provided without further information on the custom dataset and training pipeline, but exploring the TorchVision documentation and example code for Mask R-CNN may offer insights into implementing amodal segmentation or increasing mask resolution.

Notes

The effectiveness of these suggestions may depend on the specific characteristics of the custom dataset and the requirements of the industrial bin picking application, and further experimentation and evaluation may be necessary to determine the best approach.
The provided image and additional context suggest that the issue is highly dependent on the specific use case and dataset, and a tailored solution may be required to achieve consistent and complete segmentation masks.

Recommendation

Apply workaround: Implementing amodal segmentation techniques or post-processing refinement may be necessary to achieve consistent and complete segmentation masks, as the current Mask R-CNN implementation in TorchVision may not meet the requirements of the industrial bin picking application.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix Problem with mask

Recommended Tools

GitHub issue graph ai analysis

Root Cause

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix Problem with mask

Recommended Tools

GitHub issue graph ai analysis

Root Cause

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING