Learning SO(3)-Invariant Semantic Correspondence via Local Shape Transform

1POSTECH, 2Seoul National University
CVPR 2024

*Indicates Equal Contribution
Teaser Image

Our approach can predict SO(3)-invariant correspondences between rotated 3D shapes.

Abstract

Establishing accurate 3D correspondences between shapes stands as a pivotal challenge with profound implications for computer vision and robotics. However, existing self-supervised methods for this problem assume perfect input shape alignment, restricting their real-world applicability. In this work, we introduce a novel self-supervised Rotation-Invariant 3D correspondence learner with Local Shape Transform, dubbed RIST, that learns to establish dense correspondences between shapes even under challenging intra-class variations and arbitrary orientations. Specifically, RIST learns to dynamically formulate an SO(3)-invariant local shape transform for each point, which maps the SO(3)-equivariant global shape descriptor of the input shape to a local shape descriptor. These local shape descriptors are provided as inputs to our decoder to facilitate point cloud self- and cross-reconstruction. Our proposed self-supervised training pipeline encourages semantically corresponding points from different shapes to be mapped to similar local shape descriptors, enabling RIST to establish dense point-wise correspondences. RIST demonstrates state-of-the-art performances on 3D part label transfer and semantic keypoint transfer given arbitrarily rotated point cloud pairs, outperforming existing methods by significant margins.

RIST: 3D Rotation-Invariant Local Shape Transform

Overview of Proposed Method
The input point clouds are independently encoded to SO(3)-equivariant global shape descriptor and dynamic SO(3)-invariant point-wise local shape transforms. The local shape transforms map the global shape descriptor to local shape descriptors by infusing local semantics and geometry, which are used as inputs to the decoder for selfreconstruction. For cross-reconstruction, we apply the local shape transforms formulated from another point cloud to reconstruct the point cloud, ensuring that the local shape descriptors successfully capture generalizable local semantics and geometries. We supervise RIST via penalizing errors in self- and cross-reconstructions. At inference, we can leverage the local shape transforms for obtaining local shape descriptors, to identify the dense correspondences.

Part Label Transfer Results

Overview of Proposed Method
Average IoU (%) of part label transfer on ShapeNetPart

Keypoint Transfer Results

Overview of Proposed Method
Correspondences (%) of keypoint transfer on KeypointNet
Overview of Proposed Method
Keypoint transfer results on the motorcycle and airplane categories

Video Presentation

Poster

BibTeX


        @inproceedings{park2024learning,
          title={Learning SO(3)-Invariant Semantic Correspondence via Local Shape Transform},
          author={Park, Chunghyun and Kim, Seungwook and Park, Jaesik and Cho, Minsu},
          booktitle={Proceedings of the {IEEE/CVF} Conference on Computer Vision and Pattern Recognition (CVPR)},
          month={June},
          year={2024},
         }