Skip to main content
This chapter covers perception systems for robotics and computer vision across six areas: sensor modeling, deep learning foundations, object detection, segmentation, state estimation, and mapping.

Sensor models

Sensor models provide the mathematical foundation for understanding how robots perceive their environment through cameras, lidar, and other sensors.

Convolutional neural networks

CNNs are the backbone of modern computer vision systems, enabling image classification, feature extraction, and visual understanding.

Object detection

Object detection covers scene understanding fundamentals, evaluation metrics, and the evolution from two-stage (RCNN family) to single-stage (YOLO family) detectors.

Faster RCNN from scratch (PyTorch)

A six-notebook series building every Faster RCNN component from scratch in pure PyTorch, from COCO data loading through end-to-end training and inference.

YOLO from scratch (PyTorch)

A five-notebook series building YOLOv8-style single-stage detection in pure PyTorch, from data loading through inference and evaluation.

Object segmentation

Instance and semantic segmentation extend object detection to produce pixel-level masks, enabling fine-grained scene understanding.

State estimation

State estimation enables robots to determine their position using probabilistic models that fuse sensor observations over time.

Mapping

Mapping algorithms build spatial representations of the environment that robots use for navigation and planning.