Assignment 4 — Visual SLAM

Introduction

Visual SLAM (Simultaneous Localization and Mapping) combines visual feature detection and tracking with pose graph optimization to simultaneously build a map of the environment and estimate the camera’s trajectory within it. VSLAM is particularly useful in GPS-denied environments such as indoor spaces, tunnels, and dense urban areas.

Prerequisites

Docker installed on your workstation
A webcam or USB camera
Familiarity with ORB features and camera geometry

Task 1: Understand VSLAM from first principles

To fully understand how VSLAM works under the hood, follow the two lecture videos below and implement a VSLAM pipeline in Python. You may use any combination of OpenCV, PyTorch, RVC3-Python, or FilterPy.

Your implementation should demonstrate:

ORB (or similar) feature detection and matching
Essential matrix estimation and decomposition
Triangulation of 3D landmarks
Camera trajectory estimation

Use the KITTI odometry dataset for evaluation, available here. Compare your estimated trajectory against the ground truth.

You may alternatively work with the Python bindings of StellaVSLAM, but you must understand the associated C++ code and document what each called C++ function does in your report.

Task 2: VSLAM on your own indoor space

StellaVSLAM is a maintained fork of OpenVSLAM — a flexible visual SLAM system that supports monocular, stereo, and RGB-D inputs. It uses ORB (Oriented FAST and Rotated BRIEF) features to detect, track, and map the environment in 3D. First, validate your setup by running the Equirectangular Datasets example with the aist_living_lab_1 video following the Docker instructions. The video below shows the expected end result:

Then run VSLAM on your own space:

Record a video of your own indoor space (room, dorm, lab) using your laptop webcam or a USB camera. Walk slowly, covering the space from multiple viewpoints.
Follow these instructions for UVC cameras to run StellaVSLAM on your video.
Create a camera configuration YAML file using calibration parameters from the Camera Calibration lecture. Use example/aist/equirectangular.yaml as a template, replacing the camera matrix and distortion coefficients with your own values.
Publish a demo video on your YouTube channel explaining and showcasing the reconstruction of your space.

Task 3: Integrate StellaVSLAM with Nav2

Integrate StellaVSLAM into the ROS 2 navigation stack and demonstrate that it can work with Nav2 to navigate the TurtleBot in the maze without a prior map. This showcases the principles of Recursive State Estimation (RSE) — using visual landmarks and additional algorithms to achieve globally consistent localization without a-priori knowledge of the environment. Document your integration in tutorial format, as it will be used in a class setting.

Deliverables

Notebook with all cells executed.
Python VSLAM implementation with trajectory plots overlaid on KITTI ground truth (Task 1).
Screenshots/recordings of StellaVSLAM running on both the reference dataset and your own space (Task 2).
YouTube video link for Task 2.
ROS 2 integration tutorial and demo video (Task 3).

This assignment may require calibrated camera intrinsic parameters. See the Camera Calibration page for the full calibration workflow, including Zhang’s method and the OpenCV reference solution.

Edit this page on GitHub or file an issue.

Course

Study Guides

Assignments-Spring-2026

Midterm Exam

Assignment 4 — Visual SLAM

Introduction

Prerequisites

Task 1: Understand VSLAM from first principles

Task 2: VSLAM on your own indoor space

Task 3: Integrate StellaVSLAM with Nav2

Deliverables

Course

Study Guides

Assignments-Spring-2026

Midterm Exam

​Introduction

​Prerequisites

​Task 1: Understand VSLAM from first principles

​Task 2: VSLAM on your own indoor space

​Task 3: Integrate StellaVSLAM with Nav2

​Deliverables

Introduction

Prerequisites

Task 1: Understand VSLAM from first principles

Task 2: VSLAM on your own indoor space

Task 3: Integrate StellaVSLAM with Nav2

Deliverables