Perception

This chapter covers perception systems for robotics and computer vision across six areas: sensor modeling, deep learning foundations, object detection, segmentation, state estimation, and mapping.

Sensor Models

Camera models, calibration techniques, and probabilistic sensor models including beam and likelihood field approaches.

CNNs

Convolutional neural networks for image classification, including layer types, architectures, and visualization techniques.

Object Detection

Detecting and localizing objects in images using two-stage and single-stage deep learning detectors.

Object Segmentation

Pixel-wise classification for semantic and instance segmentation using Mask RCNN and UNet.

State Estimation

Recursive Bayesian estimation, Kalman filters, and particle filters for robot localization.

Mapping

Occupancy grid mapping and simultaneous localization and mapping (SLAM).

Sensor models

Sensor models provide the mathematical foundation for understanding how robots perceive their environment through cameras, lidar, and other sensors.

Camera models

Camera fundamentals and image processing for robotics.

Pinhole model

Mathematical representation of the pinhole camera model.

Camera calibration

Practical camera calibration using OpenCV.

Beam models

Probabilistic models for range sensors.

Convolutional neural networks

CNNs are the backbone of modern computer vision systems, enabling image classification, feature extraction, and visual understanding.

CNN introduction

Introduction to convolutional neural networks.

CNN layers

Understanding CNN layer types and operations.

CNN architectures

Example architectures: LeNet, AlexNet, VGG, ResNet.

Feature extraction: ResNet

ResNet as a backbone for downstream vision tasks.

Object detection

Object detection covers scene understanding fundamentals, evaluation metrics, and the evolution from two-stage (RCNN family) to single-stage (YOLO family) detectors.

Scene understanding

Detection vs classification, the detection pipeline, region proposals, FCNs, and the COCO dataset.

Detection metrics

Precision, recall, mAP, and IoU for evaluating detectors.

RCNN

Region-based CNN: selective search, CNN features, SVM classification.

Fast RCNN

Shared convolutional features and ROI pooling for end-to-end training.

Faster RCNN

Region Proposal Network enabling fully end-to-end two-stage detection.

Faster RCNN from scratch (PyTorch)

A six-notebook series building every Faster RCNN component from scratch in pure PyTorch, from COCO data loading through end-to-end training and inference.