Skip to main content
See the Spring 2026 Academic Calendar for semester dates. Each week below lists the readings, lecture topics, and deliverables you should complete.
1

Read TIF Chapter 1

The challenge of vision from Foundations of Computer Vision.
2

Review prerequisites

Python, linear algebra, probability theory, and camera fundamentals. See Prerequisites.
3

Review lecture: Introduction

Computer vision for agents with egomotion. Course roadmap and overview.
4

Watch videos

How We Understand Scenes — Human Perception and Imaging. Mathematical Prerequisites — Review the math foundations needed for the course.
5

Set up your development environment

Follow the Dev Environment guide to install Docker and configure your container.
6

Import the course repository

Import eng-ai-agents to your GitHub account and clone it locally.
1

Read TIF Chapters 9 & 10

Introduction to learning and gradient-based learning algorithms from Foundations of Computer Vision.
2

Read BISHOP Chapters 4 & 5

Single-variable and multivariate models, regularization, Bayesian linear regression, and single-layer networks from Deep Learning: Foundations and Concepts.
3

Review lecture: Supervised Learning

Perception subsystem, reflexive agents, the learning problem. See The Learning Problem.
4

Review lecture: Linear Regression

Regression fundamentals and empirical risk minimization. See Linear Regression.
5

Review lecture: SGD Optimization

Stochastic gradient descent for minimizing the empirical risk. See SGD.
6

Read GERON Chapter 4 — SGD sections

Read the Gradient Descent, Batch Gradient Descent, Stochastic Gradient Descent, and Mini-Batch Gradient Descent sections from Chapter 4: Training Linear Models.
7

Run the GERON Chapter 4 notebook

Work through the Training Linear Models notebook.
8

Run the SGD notebook

Execute the SGD Sinusoidal Dataset notebook in your container.
9

Review lecture: Entropy

Information theory principles and cross-entropy. See Entropy.
10

Review lecture: Marginal Maximum Likelihood

Marginal likelihood and parameter estimation. See Marginal Maximum Likelihood.
11

Review lecture: Conditional Maximum Likelihood

Conditional likelihood for supervised learning. See Conditional Maximum Likelihood.
12

Review lecture: Classification Introduction

Classification fundamentals and decision boundaries. See Classification Introduction.
13

Review lecture: Logistic Regression

Binary classification with logistic regression. See Logistic Regression.
14

Watch videos

Coming soon — Statistical learning theory video lectures are in development.
1

Read TIF Chapters 12 & 13

Neural networks and neural networks as distribution transformers from Foundations of Computer Vision.
2

Read BISHOP Chapter 6

Cross entropy loss, training and regularization of dense layers from Deep Learning: Foundations and Concepts.
3

Review lecture: DNN Introduction

Forward pass and neural network architectures. See DNN Introduction.
4

Run the Fashion MNIST notebook

Execute the Fashion MNIST Case Study notebook in your container.
5

Run the GERON Chapter 9 notebook

Work through the Artificial Neural Networks notebook.
6

Submit Assignment 1

Complete and submit Assignment 1.
7

Watch videos

Coming soon — Dense neural networks video lectures are in development.
1

Read TIF Chapter 24, BISHOP Chapter 10

Convolutional Neural Network architecture and applications from Foundations of Computer Vision and Deep Learning: Foundations and Concepts.
2

Review lecture: CNN Introduction

Convolution operations, pooling, and spatial feature hierarchies. See CNN Introduction.
3

Review lecture: CNN Layers, Architectures and ResNets

Layer types, architectural patterns, ResNet, and VGG. See CNN Layers, CNN Example Architectures, and Feature Extraction with ResNet.
4

Read GERON Chapter 12 — CNN sections

Read the Convolutional Layers, Pooling Layers, and CNN Architectures sections from Chapter 12: Deep Computer Vision with CNNs.
5

Run the GERON Chapter 12 notebook

Work through the Deep Computer Vision with CNNs notebook.
6

Watch videos

Convolution and Correlation — A linear operation for extracting spatial features. CNN Architectures — Looking inside a CNN layer and understanding architectural patterns. Image Classification — Image classification with data augmentation. What CNNs Learn — Visualizing the features learned by CNNs. ResNets — Residual Networks and skip connections.
1

Read TIF Chapter 50

Object recognition from Foundations of Computer Vision.
2

Review lecture: Detection Metrics

Evaluation metrics for object detection. See Detection Metrics.
3

Review lecture: Object Detection

Detection pipelines and architectures. See Object Detection Introduction.
4

Review lecture: R-CNN

Region-based convolutional neural networks. See R-CNN.
5

Review lecture: Fast R-CNN

Efficient region-based detection. See Fast R-CNN.
6

Review lecture: Faster R-CNN

Region proposal networks and two-stage detection. See Faster R-CNN.
7

Watch videos

Introduction to Object Detection — Object detection in a physical security application. Computer Vision Datasets — What types of annotations are used in computer vision? Region-based Object Detectors — R-CNN, Fast R-CNN, Faster R-CNN.
1

Read SZELINSKI Chapter 6

Pixel-level labeling, panoptic segmentation for full scene understanding.
2

Review lecture: Mask R-CNN

Instance segmentation architecture. See Mask R-CNN.
3

Review lecture: U-Net

Encoder-decoder architecture for segmentation. See U-Net.
4

Run the Detectron2 notebook

Execute the Detectron2 Tutorial notebook.
5

Watch videos

Coming soon — Semantic segmentation video lectures are in development.
1

Read BISHOP Chapter 12, TIF Chapter 26

Self-attention for global image dependencies, ViT vs CNN trade-offs.
2

Review lecture: Transformers

Self-attention and multi-head attention. See Transformers Introduction.
3

Review lecture: Transfer Learning

Pretrained models and fine-tuning. See Transfer Learning.
4

Run the GERON Chapter 16 notebook

Work through the Vision and Multimodal Transformers notebook.
5

Watch videos

Introduction to Transformers — The transformer architecture and the simple attention mechanism. The Learnable Attention Mechanism — Implementing the scaled dot-product self attention mechanism. Multi-Head Self Attention — Using multiple attention heads to capture different aspects of input sequences.
1

Read TIF Chapter 5

Imaging fundamentals from Foundations of Computer Vision.
2

Read: State Estimation

HMMs, Bayes Filter, Kalman Filter, and Particle Filters. See State Estimation.
3

Review lecture: Recursive State Estimation

Probabilistic reasoning over time for object tracking. See Recursive State Estimation.
4

Watch videos

Coming soon — Object tracking video lectures are in development.
1

Read the CLIP paper, TIF Chapter 51

Vision-language pretraining, CLIP for relating images and text.
2

Review lecture: CLIP

Contrastive learning and zero-shot classification. See CLIP.
3

Review lecture: VLM Introduction

Overview of vision-language models. See VLM Introduction.
4

Watch videos

Coming soon — Contrastive learning video lectures are in development.
1

Review lecture: BLIP-2

Bridging vision encoders with language models. See BLIP-2.
2

Review lecture: LLaVA

Visual instruction tuning for multimodal understanding. See LLaVA.
3

Watch videos

Coming soon — Vision-language models video lectures are in development.
1

Review lecture: SAM

Meta’s Segment Anything Model as a worker receiving multimodal prompts from VLM planners.
2

Explore SAM demos

Experiment with SAM for interactive segmentation tasks using different prompt types (points, boxes, text).
3

Watch videos

Coming soon — Prompted vision models video lectures are in development.
1

Read TIF Chapter 45

Radiance fields from Foundations of Computer Vision.
2

Review lecture: NeRF

Creating 3D scenes from 2D images, volume rendering concepts.
3

Explore NeRF resources

Review NeRF implementations and understand the novel view synthesis pipeline.
4

Watch videos

Coming soon — Neural radiance fields video lectures are in development.
1

Read TIF Chapters 32 & 34

Generative models and conditional generative models from Foundations of Computer Vision.
2

Review lecture: Diffusion Models

Physics-inspired learning, conditional image generation, DALL-E and Stable Diffusion.
3

Run the GERON Chapter 18 notebook

Work through the Autoencoders, GANs, and Diffusion Models notebook.
4

Complete any outstanding assignments

Ensure all assignments are submitted via GitHub and Canvas/Brightspace.
5

Final exam preparation

Review cross-cutting themes: how detection, segmentation, VLMs, and generative models form a complete vision pipeline.
6

Watch videos

Coming soon — Diffusion models video lectures are in development.