Sensor Models
Camera models, calibration techniques, and probabilistic sensor models including beam and likelihood field approaches.
CNNs
Convolutional neural networks for image classification, including layer types, architectures, and visualization techniques.
Object Detection
Detecting and localizing objects in images using two-stage and single-stage deep learning detectors.
Object Segmentation
Pixel-wise classification for semantic and instance segmentation using Mask RCNN and UNet.
Sensor models
Sensor models provide the mathematical foundation for understanding how robots perceive their environment through cameras, lidar, and other sensors.Camera models
Camera fundamentals and image processing for robotics.
Pinhole model
Mathematical representation of the pinhole camera model.
Camera calibration
Practical camera calibration using OpenCV.
Beam models
Probabilistic models for range sensors.
Convolutional neural networks
CNNs are the backbone of modern computer vision systems, enabling image classification, feature extraction, and visual understanding.CNN introduction
Introduction to convolutional neural networks.
CNN layers
Understanding CNN layer types and operations.
CNN architectures
Example architectures: LeNet, AlexNet, VGG, ResNet.
Feature extraction: ResNet
ResNet as a backbone for downstream vision tasks.
Object detection
Object detection covers scene understanding fundamentals, evaluation metrics, and the evolution from two-stage (RCNN family) to single-stage (YOLO family) detectors.Scene understanding
Detection vs classification, the detection pipeline, region proposals, FCNs, and the COCO dataset.
Detection metrics
Precision, recall, mAP, and IoU for evaluating detectors.
RCNN
Region-based CNN: selective search, CNN features, SVM classification.
Fast RCNN
Shared convolutional features and ROI pooling for end-to-end training.
Faster RCNN
Region Proposal Network enabling fully end-to-end two-stage detection.
Faster RCNN from scratch (PyTorch)
A six-notebook series building every Faster RCNN component from scratch in pure PyTorch, from COCO data loading through end-to-end training and inference.01 · COCO dataloader
Streaming COCO from Hugging Face, collation, and anchor target assignment.
02 · Backbone
ResNet50 feature pyramid network (FPN) with lateral connections.
03 · RPN
Region Proposal Network: anchor generation, objectness head, NMS.
04 · ROI head
ROI Align, two-layer MLP head, and sibling classification and regression predictors.
05 · Training
End-to-end training with AMP and gradient checkpointing on COCO streaming data.
06 · Inference
Checkpoint loading, COCO validation inference, proposal and detection visualization.
YOLO from scratch (PyTorch)
A five-notebook series building YOLOv8-style single-stage detection in pure PyTorch, from data loading through inference and evaluation.YOLO introduction
Single-stage detection design philosophy, anchor-free heads, and the YOLO architecture family.
01 · COCO dataloader
Streaming COCO, grid target assignment, and mosaic augmentation.
02 · Backbone
CSPDarknet backbone with C2f bottleneck blocks.
03 · Neck and head
PANet feature pyramid neck and decoupled detection head.
04 · Loss and training
Task-aligned assignment, distribution focal loss, and training loop.
05 · Inference and evaluation
NMS post-processing, COCO mAP evaluation, and latency benchmarks.
Object segmentation
Instance and semantic segmentation extend object detection to produce pixel-level masks, enabling fine-grained scene understanding.Mask RCNN
Extending Faster RCNN with a mask head for instance segmentation.
Mask RCNN · TF demo
Running inference with the TensorFlow Mask RCNN implementation.
Mask RCNN · inspect data
Visualizing COCO data loading, augmentation, and anchor generation.
Mask RCNN · inspect model
Layer-by-layer inspection of model activations and outputs.
Mask RCNN · inspect weights
Visualizing learned filter weights and statistics.
Mask RCNN · PyTorch (Detectron2)
Detectron2 Mask RCNN training and evaluation workflow.
Mask RCNN · torchvision inference
Running pretrained Mask RCNN inference with torchvision.
UNet
Encoder-decoder architecture for semantic segmentation.

