Introduction
Visual SLAM (Simultaneous Localization and Mapping) combines visual feature detection and tracking with pose graph optimization to simultaneously build a map of the environment and estimate the camera’s trajectory within it. VSLAM is particularly useful in GPS-denied environments such as indoor spaces, tunnels, and dense urban areas.Prerequisites
- Docker installed on your workstation
- A webcam or USB camera
- Familiarity with ORB features and camera geometry
Task 1: Understand VSLAM from first principles
To fully understand how VSLAM works under the hood, follow the two lecture videos below and implement a VSLAM pipeline in Python. You may use any combination of OpenCV, PyTorch, RVC3-Python, or FilterPy.
Your implementation should demonstrate:
- ORB (or similar) feature detection and matching
- Essential matrix estimation and decomposition
- Triangulation of 3D landmarks
- Camera trajectory estimation
You may alternatively work with the Python bindings of StellaVSLAM, but you must understand the associated C++ code and document what each called C++ function does in your report.
Task 2: VSLAM on your own indoor space
StellaVSLAM is a maintained fork of OpenVSLAM — a flexible visual SLAM system that supports monocular, stereo, and RGB-D inputs. It uses ORB (Oriented FAST and Rotated BRIEF) features to detect, track, and map the environment in 3D. First, validate your setup by running the Equirectangular Datasets example with theaist_living_lab_1 video following the Docker instructions. The video below shows the expected end result:
Then run VSLAM on your own space:
- Record a video of your own indoor space (room, dorm, lab) using your laptop webcam or a USB camera. Walk slowly, covering the space from multiple viewpoints.
- Follow these instructions for UVC cameras to run StellaVSLAM on your video.
-
Create a camera configuration YAML file using calibration parameters from the Camera Calibration lecture. Use
example/aist/equirectangular.yamlas a template, replacing the camera matrix and distortion coefficients with your own values. - Publish a demo video on your YouTube channel explaining and showcasing the reconstruction of your space.
Task 3: Integrate StellaVSLAM with Nav2
Integrate StellaVSLAM into the ROS 2 navigation stack and demonstrate that it can work with Nav2 to navigate the TurtleBot in the maze without a prior map. This showcases the principles of Recursive State Estimation (RSE) — using visual landmarks and additional algorithms to achieve globally consistent localization without a-priori knowledge of the environment. Document your integration in tutorial format, as it will be used in a class setting.Deliverables
- Notebook with all cells executed.
- Python VSLAM implementation with trajectory plots overlaid on KITTI ground truth (Task 1).
- Screenshots/recordings of StellaVSLAM running on both the reference dataset and your own space (Task 2).
- YouTube video link for Task 2.
- ROS 2 integration tutorial and demo video (Task 3).
This assignment may require calibrated camera intrinsic parameters. See the Camera Calibration page for the full calibration workflow, including Zhang’s method and the OpenCV reference solution.

