4 modules · 15 lessons · 5+ hours of content
Subscribe to our YouTube channel and explore the complete curriculum below.
1. Course Introduction
1. Course Introduction
How We Understand Scenes
Human Perception and Imaging.
Mathematical Prerequisites
What you need to know before diving into the course material.
2. Convolutional Neural Networks
2. Convolutional Neural Networks
Convolution and Correlation
A linear operation for extracting spatial features.
CNN Architectures
Looking inside a CNN layer and understanding architectural patterns.
Image Classification
Image classification with data augmentation.
What CNNs Learn
Visualizing the features learned by CNNs.
ResNets
Residual Networks and skip connections.
3. Object Detection
3. Object Detection
Introduction to Object Detection
Object detection in a physical security application.
Computer Vision Datasets
What types of annotations are used in computer vision?
Region-based Object Detectors
R-CNN, Fast R-CNN, Faster R-CNN.
4. Vision-Language Models
4. Vision-Language Models
Introduction to Transformers
The transformer architecture and the simple attention mechanism.
The Learnable Attention Mechanism
Implementing the scaled dot-product self attention mechanism.
Multi-Head Self Attention
Using multiple attention heads to capture different aspects of input sequences.

