Jingwen Wang

Jingwen Wang | 王敬文

I am a PhD student at the CDT in Foundational AI at UCL, supervised by Prof. Lourdes Agapito and Prof. Niloy Mitra. My research interest lies in object-aware semantic SLAM and 3D reconstruction, combining object-level scene understanding and SLAM systems using learning-based approaches. Prior to my PhD I obtained master's degree in Robotics from UCL, and Bachelors degree in Electrical Engineering from University of Liverpool.

News

02/2024: Our paper MorpheuS on 360° reconstruction of a moving object got accepted to CVPR 2024.
09/2023: Our paper SeMLaPS on real-time semantic mapping got accepted to RA-L.
07/2023: I gave a talk (in Chinese) about neural implicit representation in SLAM on cvlife and ShenLanXueYuan.
05/2023: Code and data of Co-SLAM are now released on Github. We also released a benchmark on Neural-SLAM methods on Github.
03/2023: Our paper Co-SLAM on neural implicit SLAM got accepted to CVPR 2023.
09/2022: Code and data of GO-Surf are now released on Github.
08/2022: Our paper GO-Surf on neural implicit reconstruction got accepted to 3DV 2022 as oral presentation.
07/2022: I joined SLAMcore as a part-time research intern, working with Dr. Alexander Vakhitov.
12/2021: Code of DSP-SLAM is now released on Github.
10/2021: Our paper DSP-SLAM on object SLAM got accepted to 3DV 2021.

Research

My research interests include semantic SLAM, 3D Reconstruction, Neural Scene Representations and Robotics. I'm also interested in exploring the coupling point between long-term SLAM and embodied AI.

MorpheuS: Neural Dynamic 360° Surface Reconstruction from Monocular RGB-D Video
Hengyi Wang, Jingwen Wang, Lourdes Agapito
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
project page | arxiv | code

We present MorpheuS, a dynamic scene reconstruction method that leverages neural implicit representations and diffusion priors for achieving 360° reconstruction of a moving object from a monocular RGB-D video.

SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and Quasi-Planar Segmentation
Jingwen Wang, Juan Tarrio, Lourdes Agapito, Pablo F. Alcantarilla, Alexander Vakhitov
IEEE Robotics and Automation Letters (RA-L), 2023
project page | arxiv | video | video (bilibili) | code

We present SeMLaPS, a real-time semantic mapping system based on 2D-3D networks that takes in a sequence of RGB-D images, outputs the semantic mapping of the scene sequentially. Our experiments show that SeMLaPS achieves state-of-the-art accuracy among real-time 2D-3D networks and shows better cross-sensor generalization capabilities than methods based on 3D-only networks.

Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM
Hengyi Wang*, Jingwen Wang*, Lourdes Agapito
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
project page | arxiv | video | video (bilibili) | benchmark | code

We present Co-SLAM, a neural SLAM method that perform real-time camera tracking and dense reconstruction based on a joint encoding. Experimental results show that Co-SLAM runs at 10-17Hz and achieves state-of-the-art scene reconstruction results and competitive tracking performance in various datasets and benchmarks (ScanNet, TUM, Replica, Synthetic RGBD).

GO-Surf: Neural Feature Grid Optimization for Fast, High-Fidelity RGB-D Surface Reconstruction
Jingwen Wang*, Tymoteusz Bleja*, Lourdes Agapito
International Conference on 3D Vision (3DV), 2022 (Oral Presentation)
project page | arxiv | video | video (bilibili) | code

We present GO-Surf, a direct feature grid optimization method for accurate and fast surface reconstruction from RGB-D sequences. GO-Surf can optimize sequences of 1-2K frames in 15-45 minutes, a speedup of 60 times over NeuralRGB-D (Azinovic et al. 2022). We also show that with slightly reduced voxel resolution GO-Surf is able to run online mapping at interactive framerate (15FPS).

DSP-SLAM: Object Oriented SLAM with Deep Shape Priors
Jingwen Wang, Martin Rünz, Lourdes Agapito
International Conference on 3D Vision (3DV), 2021
project page | arxiv | video | video (bilibili) | code

We propose DSP-SLAM, an object-oriented SLAM system that builds a rich and accurate joint map of dense 3D models for foreground objects, and sparse landmark points to represent the background. Objects are represented as compact and optimisable codes learned from a category-specific deep shape embeddings. Camera poses, object poses and 3D feature points are jointly optimized in a factor-graph via object-aware bundle adjustment.

Other Projects

Python Implementation of KinectFusion with PyTorch
University College London (UCL)
2021

Re-implemented the KinectFusion algorithm with Python and PyTorch. All the core functions (TSDF volume, frame-to-model tracking, point-to-plane ICP, raycasting, TSDF fusion, etc.) are implemented using pure PyTorch without any custom CUDA kernels. The system is able to run at 17Hz with a single RTX2080 GPU. Code available on GitHub.

Object Tracking and Mapping with an RGB-D Camera
University College London (UCL)
2020

Built a SLAM system that is able to track 6-DoF poses of several objects with known 3D shapes using ICP with an RGB-D camera and reduced the tracking uncertainty by involving object poses in the joint factor-graph and iteratively optimizing object poses. This project was Part of EU Horizon 2020 Secondhands Project.

Towards Realistic Data Augmentation for DOA estimation
Emotech LTD
2018

Work done during an internship at Emotech. Developed a data-driven sound source Direction-of-Arrival (DOA) estimation algorithm using CNNs, and performed more realistic data augmentation using HoME-platform.

Service

	Teaching Assistant: Image Processing, Autumn 2019, 2021, 2023 Teaching Assistant: Robot Vision and Navigation, Spring 2020, 2021
	Conference Reviewer: ICRA (2022, 2024), IROS (2022, 2023), NeurIPS (2023), ICLR (2024) Journal Reviewer: RA-L, IJRR

Thank Dr. Jon Barron for sharing the source code of the website