Skip to main content

4 modules · 15 lessons · 5+ hours of content

Subscribe to our YouTube channel and explore the complete curriculum below.

How We Understand Scenes

Human Perception and Imaging.

Mathematical Prerequisites

What you need to know before diving into the course material.

Convolution and Correlation

A linear operation for extracting spatial features.

CNN Architectures

Looking inside a CNN layer and understanding architectural patterns.

Image Classification

Image classification with data augmentation.

What CNNs Learn

Visualizing the features learned by CNNs.

ResNets

Residual Networks and skip connections.

Introduction to Object Detection

Object detection in a physical security application.

Computer Vision Datasets

What types of annotations are used in computer vision?

Region-based Object Detectors

R-CNN, Fast R-CNN, Faster R-CNN.

Introduction to Transformers

The transformer architecture and the simple attention mechanism.

The Learnable Attention Mechanism

Implementing the scaled dot-product self attention mechanism.

Multi-Head Self Attention

Using multiple attention heads to capture different aspects of input sequences.


Course Information

CS681: Deep Learning for Computer Vision, NJIT (Spring 2026)

View Full Syllabus

See the complete course syllabus including assignments and schedule.