Computer Vision Weekly Study Guide

See the Spring 2026 Academic Calendar for semester dates. Each week below lists the readings, lecture topics, and deliverables you should complete.

Week 1 — Introduction

Read TIF Chapter 1

The challenge of vision from Foundations of Computer Vision.

Review prerequisites

Python, linear algebra, probability theory, and camera fundamentals. See Prerequisites.

Review lecture: Introduction

Computer vision for agents with egomotion. Course roadmap and overview.

Watch videos

How We Understand Scenes — Human Perception and Imaging. Mathematical Prerequisites — Review the math foundations needed for the course.

Set up your development environment

Follow the Dev Environment guide to install Docker and configure your container.

Import the course repository

Import eng-ai-agents to your GitHub account and clone it locally.

Week 2 — Statistical Learning Theory

Read TIF Chapters 9 & 10

Introduction to learning and gradient-based learning algorithms from Foundations of Computer Vision.

Read BISHOP Chapters 4 & 5

Single-variable and multivariate models, regularization, Bayesian linear regression, and single-layer networks from Deep Learning: Foundations and Concepts.

Review lecture: Supervised Learning

Perception subsystem, reflexive agents, the learning problem. See The Learning Problem.

Review lecture: Linear Regression

Regression fundamentals and empirical risk minimization. See Linear Regression.

Review lecture: SGD Optimization

Stochastic gradient descent for minimizing the empirical risk. See SGD.

Read GERON Chapter 4 — SGD sections

Read the Gradient Descent, Batch Gradient Descent, Stochastic Gradient Descent, and Mini-Batch Gradient Descent sections from Chapter 4: Training Linear Models.

Run the GERON Chapter 4 notebook

Work through the Training Linear Models notebook.

Run the SGD notebook

Execute the SGD Sinusoidal Dataset notebook in your container.

Review lecture: Entropy

Information theory principles and cross-entropy. See Entropy.

Review lecture: Marginal Maximum Likelihood

Marginal likelihood and parameter estimation. See Marginal Maximum Likelihood.

Review lecture: Conditional Maximum Likelihood

Conditional likelihood for supervised learning. See Conditional Maximum Likelihood.

Review lecture: Classification Introduction

Classification fundamentals and decision boundaries. See Classification Introduction.

Review lecture: Logistic Regression

Binary classification with logistic regression. See Logistic Regression.

Watch videos

Coming soon — Statistical learning theory video lectures are in development.

Week 3 — Dense Neural Networks

Read TIF Chapters 12 & 13

Neural networks and neural networks as distribution transformers from Foundations of Computer Vision.

Read BISHOP Chapter 6

Cross entropy loss, training and regularization of dense layers from Deep Learning: Foundations and Concepts.

Review lecture: DNN Introduction

Forward pass and neural network architectures. See DNN Introduction.

Run the Fashion MNIST notebook

Execute the Fashion MNIST Case Study notebook in your container.

Run the GERON Chapter 9 notebook

Work through the Artificial Neural Networks notebook.

Submit Assignment 1

Complete and submit Assignment 1.

Watch videos

Coming soon — Dense neural networks video lectures are in development.

Week 4 — CNNs

Read TIF Chapter 24, BISHOP Chapter 10

Convolutional Neural Network architecture and applications from Foundations of Computer Vision and Deep Learning: Foundations and Concepts.

Review lecture: CNN Introduction

Convolution operations, pooling, and spatial feature hierarchies. See CNN Introduction.

Review lecture: CNN Layers, Architectures and ResNets

Layer types, architectural patterns, ResNet, and VGG. See CNN Layers, CNN Example Architectures, and Feature Extraction with ResNet.

Read GERON Chapter 12 — CNN sections

Read the Convolutional Layers, Pooling Layers, and CNN Architectures sections from Chapter 12: Deep Computer Vision with CNNs.

Run the GERON Chapter 12 notebook

Work through the Deep Computer Vision with CNNs notebook.

Watch videos

Convolution and Correlation — A linear operation for extracting spatial features. CNN Architectures — Looking inside a CNN layer and understanding architectural patterns. Image Classification — Image classification with data augmentation. What CNNs Learn — Visualizing the features learned by CNNs. ResNets — Residual Networks and skip connections.

Week 5 — Object Detection

Read TIF Chapter 50

Object recognition from Foundations of Computer Vision.

Review lecture: Detection Metrics

Evaluation metrics for object detection. See Detection Metrics.

Review lecture: Object Detection

Detection pipelines and architectures. See Object Detection Introduction.

Review lecture: R-CNN

Region-based convolutional neural networks. See R-CNN.

Review lecture: Fast R-CNN

Efficient region-based detection. See Fast R-CNN.

Review lecture: Faster R-CNN

Region proposal networks and two-stage detection. See Faster R-CNN.

Watch videos

Introduction to Object Detection — Object detection in a physical security application. Computer Vision Datasets — What types of annotations are used in computer vision? Region-based Object Detectors — R-CNN, Fast R-CNN, Faster R-CNN.

Week 6 — Object Tracking

Read TIF Chapter 5

Imaging fundamentals from Foundations of Computer Vision.

Read: State Estimation

HMMs, Bayes Filter, Kalman Filter, and Particle Filters. See State Estimation.

Review lecture: Recursive State Estimation

Probabilistic reasoning over time for object tracking. See Recursive State Estimation.

Watch videos

Coming soon — Object tracking video lectures are in development.

Week 7 — Vision Transformers

Read BISHOP Chapter 12, TIF Chapter 26

Self-attention for global image dependencies, ViT vs CNN trade-offs.

Review lecture: Transformers

Self-attention and multi-head attention. See Transformers Introduction.

Review lecture: Transfer Learning

Pretrained models and fine-tuning. See Transfer Learning.

Run the GERON Chapter 16 notebook

Work through the Vision and Multimodal Transformers notebook.

Watch videos

Introduction to Transformers — The transformer architecture and the simple attention mechanism. The Learnable Attention Mechanism — Implementing the scaled dot-product self attention mechanism. Multi-Head Self Attention — Using multiple attention heads to capture different aspects of input sequences.

Week 8 — Semantic Segmentation

Read SZELINSKI Chapter 6

Pixel-level labeling, panoptic segmentation for full scene understanding.

Review lecture: Mask R-CNN

Instance segmentation architecture. See Mask R-CNN.

Review lecture: U-Net

Encoder-decoder architecture for segmentation. See U-Net.

Run the Detectron2 notebook

Execute the Detectron2 Tutorial notebook.

Watch videos

Coming soon — Semantic segmentation video lectures are in development.

Week 9 — Contrastive Learning

Read the CLIP paper, TIF Chapter 51

Vision-language pretraining, CLIP for relating images and text.

Review lecture: CLIP

Contrastive learning and zero-shot classification. See CLIP.

Review lecture: VLM Introduction

Overview of vision-language models. See VLM Introduction.

Watch videos

Coming soon — Contrastive learning video lectures are in development.

Week 10 — From Retrieval to Generation

Review lecture: BLIP-2

Bridging vision encoders with language models. See BLIP-2.

Review lecture: LLaVA

Visual instruction tuning for multimodal understanding. See LLaVA.

Watch videos

Coming soon — Vision-language models video lectures are in development.

Week 11 — Prompted Vision Models

Review lecture: SAM

Meta’s Segment Anything Model as a worker receiving multimodal prompts from VLM planners.

Explore SAM demos

Experiment with SAM for interactive segmentation tasks using different prompt types (points, boxes, text).

Watch videos

Coming soon — Prompted vision models video lectures are in development.

Week 12 — Neural Radiance Fields

Read TIF Chapter 45

Radiance fields from Foundations of Computer Vision.

Review lecture: NeRF

Creating 3D scenes from 2D images, volume rendering concepts.

Explore NeRF resources

Review NeRF implementations and understand the novel view synthesis pipeline.

Watch videos

Coming soon — Neural radiance fields video lectures are in development.

Week 13 — Diffusion Models and Review

Read TIF Chapters 32 & 34

Generative models and conditional generative models from Foundations of Computer Vision.

Review lecture: Diffusion Models

Physics-inspired learning, conditional image generation, DALL-E and Stable Diffusion.

Run the GERON Chapter 18 notebook

Work through the Autoencoders, GANs, and Diffusion Models notebook.

Complete any outstanding assignments

Ensure all assignments are submitted via GitHub and Canvas/Brightspace.

Final exam preparation

Review cross-cutting themes: how detection, segmentation, VLMs, and generative models form a complete vision pipeline.

Watch videos

Coming soon — Diffusion models video lectures are in development.

Edit this page on GitHub or file an issue.

Course

Study Guides

Assignments-Spring-2026

Midterm Exam

Computer Vision Weekly Study Guide