SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and Quasi-Planar Segmentation

The availability of real-time semantics greatly improves the core geometric functionality of SLAM systems, enabling numerous robotic and AR/VR applications. We present a new methodology for real-time semantic mapping from RGB-D sequences that combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping. When segmenting a new frame we perform latent feature re-projection from previous frames based on differentiable rendering. Fusing re-projected feature maps from previous frames with current-frame features greatly improves image segmentation quality, compared to a baseline that processes images independently. For 3D map processing, we propose a novel geometric quasi-planar over-segmentation method that groups 3D map elements likely to belong to the same semantic classes, relying on surface normals. We also describe a novel neural network design for lightweight semantic map post-processing. Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems and matches the performance of 3D convolutional networks on three real indoor datasets, while working in real-time. Moreover, it shows better cross-sensor generalization abilities compared to 3D CNNs, enabling training and inference with different depth sensors.

Bibtex

@article{wang2023semlaps, title={SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and Quasi-Planar Segmentation}, author={Wang, Jingwen and Tarrio, Juan and Agapito, Lourdes and Alcantarilla, Pablo F and Vakhitov, Alexander}, journal={arXiv preprint arXiv:2306.16585}, year={2023} }

Acknowledgement

This work was done during Jingwen's internship at Slamcore LTD. Jingwen Wang is funded by the UCL Centre for Doctoral Training in Foundational AI under UKRI grant number EP/S021566/1. We also thank authors of INS-Conv for providing additional details on evaluation protocol.