Edge-Aware 3D Instance Segmentation Network with Intelligent Semantic Prior


IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

1Korea University   2Meta Reality Labs   3KAIST

Poster

Presentation Video

Abstract

While recent 3D instance segmentation approaches show promising results based on transformer architectures, they often fail to correctly identify instances with similar appearances. They also ambiguously determine edges, leading to multiple misclassifications of adjacent edge points. In this work, we introduce a novel framework to overcome these challenges and improve the complex 3D instance perception. We first propose a semantic guidance network to leverage rich semantic knowledge from a language model as intelligent priors, enhancing the functional understanding of real-world instances beyond relying solely on geometrical information. We explicitly instruct the basic instance queries using text embeddings of each instance to learn deep semantic details. Further, we utilize the edge prediction module, encouraging the segmentation network to be edge-aware. We extract voxel-wise edge maps from point features and use them as auxiliary information for learning edge cues. In our extensive experiments on large-scale benchmarks, ScanNetV2, ScanNet200, S3DIS, and STPLS3D, our method outperforms existing state-of-the-art models, demonstrating its superior performance.

Method

Overview

An overview of our framework. Built upon the classic architecture of the transformer-based 3DIS model, our model takes 3D scenes and directly infers instance masks. Our model consists of four main modules: (1) Sparse Convolutional Backbone, (2) Semantic Network, which strengthen the queries to learn contextual variations of instances with rich semantic embeddings from the pre-trained language model CLIP, (3) Mask Transformer Decoder with Mask Module and Query Refinement blocks, and (4) Edge Prediction Module, where our edge extraction head is trained to predict a voxel-wise edge map, encouraging the network to utilize edge-advanced features.

BibTeX

    @InProceedings{Roh_2024_CVPR,
        author    = {Roh, Wonseok and Jung, Hwanhee and Nam, Giljoo and Yeom, Jinseop and Park, Hyunje and Yoon, Sang Ho and Kim, Sangpil},
        title     = {Edge-Aware 3D Instance Segmentation Network with Intelligent Semantic Prior},
        booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
        month     = {June},
        year      = {2024},
        pages     = {20644-20653}
    }