Skip to main content
This chapter covers perception systems for robotics and computer vision: sensor modeling, deep learning foundations, object detection, and segmentation.

Sensor Models

Camera models, calibration techniques, and probabilistic sensor models including beam and likelihood field approaches.

CNNs

Convolutional neural networks for image classification, including layer types, architectures, and visualization techniques.

Object Detection

Detecting and localizing objects in images using two-stage and single-stage deep learning detectors.

Object Segmentation

Pixel-wise classification for semantic and instance segmentation using Mask RCNN and UNet.

Sensor models

Sensor models provide the mathematical foundation for understanding how robots perceive their environment through cameras, lidar, and other sensors.

Camera models

Camera fundamentals and image processing for robotics.

Pinhole model

Mathematical representation of the pinhole camera model.

Camera calibration

Practical camera calibration using OpenCV.

Beam models

Probabilistic models for range sensors.

Convolutional neural networks

CNNs are the backbone of modern computer vision systems, enabling image classification, feature extraction, and visual understanding.

CNN introduction

Introduction to convolutional neural networks.

CNN layers

Understanding CNN layer types and operations.

CNN architectures

Example architectures: LeNet, AlexNet, VGG, ResNet.

Feature extraction: ResNet

ResNet as a backbone for downstream vision tasks.

Object detection

Object detection covers scene understanding fundamentals, evaluation metrics, and the evolution from two-stage (RCNN family) to single-stage (YOLO family) detectors.

Scene understanding

Detection vs classification, the detection pipeline, region proposals, FCNs, and the COCO dataset.

Detection metrics

Precision, recall, mAP, and IoU for evaluating detectors.

RCNN

Region-based CNN: selective search, CNN features, SVM classification.

Fast RCNN

Shared convolutional features and ROI pooling for end-to-end training.

Faster RCNN

Region Proposal Network enabling fully end-to-end two-stage detection.

Faster RCNN from scratch (PyTorch)

A six-notebook series building every Faster RCNN component from scratch in pure PyTorch, from COCO data loading through end-to-end training and inference.

01 · COCO dataloader

Streaming COCO from Hugging Face, collation, and anchor target assignment.

02 · Backbone

ResNet50 feature pyramid network (FPN) with lateral connections.

03 · RPN

Region Proposal Network: anchor generation, objectness head, NMS.

04 · ROI head

ROI Align, two-layer MLP head, and sibling classification and regression predictors.

05 · Training

End-to-end training with AMP and gradient checkpointing on COCO streaming data.

06 · Inference

Checkpoint loading, COCO validation inference, proposal and detection visualization.

YOLO from scratch (PyTorch)

A five-notebook series building YOLOv8-style single-stage detection in pure PyTorch, from data loading through inference and evaluation.

YOLO introduction

Single-stage detection design philosophy, anchor-free heads, and the YOLO architecture family.

01 · COCO dataloader

Streaming COCO, grid target assignment, and mosaic augmentation.

02 · Backbone

CSPDarknet backbone with C2f bottleneck blocks.

03 · Neck and head

PANet feature pyramid neck and decoupled detection head.

04 · Loss and training

Task-aligned assignment, distribution focal loss, and training loop.

05 · Inference and evaluation

NMS post-processing, COCO mAP evaluation, and latency benchmarks.

Object segmentation

Instance and semantic segmentation extend object detection to produce pixel-level masks, enabling fine-grained scene understanding.

Mask RCNN

Extending Faster RCNN with a mask head for instance segmentation.

Mask RCNN · TF demo

Running inference with the TensorFlow Mask RCNN implementation.

Mask RCNN · inspect data

Visualizing COCO data loading, augmentation, and anchor generation.

Mask RCNN · inspect model

Layer-by-layer inspection of model activations and outputs.

Mask RCNN · inspect weights

Visualizing learned filter weights and statistics.

Mask RCNN · PyTorch (Detectron2)

Detectron2 Mask RCNN training and evaluation workflow.

Mask RCNN · torchvision inference

Running pretrained Mask RCNN inference with torchvision.

UNet

Encoder-decoder architecture for semantic segmentation.