Skip to main content

Books

  1. TIF - Foundations of Computer Vision by Antonio Torralba, Phillip Isola and William T. Freeman. Free online. Covers the latest deep learning applications including diffusion models.
  2. BISHOP - Deep Learning - Foundations and Concepts by C. Bishop and H. Bishop. Available to view online from the book’s website.
  3. SZELINSKI - Computer Vision: Algorithms and Applications, 2nd Edition. Free to download for personal use. Alternative to TIF for some topics.

Planned Schedule

Part I: Detection and Segmentation

LectureTopicDescription
1IntroductionComputer vision for agents with egomotion. Prerequisites review: Python, linear algebra, probability theory, camera fundamentals.
2Statistical LearningEnd-to-end prediction, featurization, fully connected neural architectures, maximum likelihood optimization. Reading: BISHOP Chapters 4-5
3Dense Neural NetworksCross entropy loss, training and regularization of dense layers. Reading: BISHOP Chapter 6
4CNNsSpatial feature hierarchies, image classification, ResNets for real-time perception. Reading: BISHOP Chapter 10
5Object DetectionYOLO and Faster R-CNN architectures for identifying and locating objects. Reading: SZELINSKI Chapter 6
6Semantic SegmentationPixel-level labeling, panoptic segmentation for full scene understanding. Reading: SZELINSKI Chapter 6
7Vision TransformersSelf-attention for global image dependencies, ViT vs CNN trade-offs. Reading: BISHOP Chapter 12, TIF Chapter 26
8Object TrackingVideo stream processing, handling occlusion, motion blur, appearance changes.

Part II: Vision Language Models (VLMs)

LectureTopicDescription
9Contrastive LearningVision-language pretraining, CLIP for relating images and text. Reading: CLIP paper, TIF Chapter 51
10From Retrieval to GenerationBLIP-2, LLaVA for image captioning and Visual Question Answering.
11Prompted Vision ModelsMeta’s SAM as a worker receiving multimodal prompts from VLM planners.

Part III: Generative Vision Models

LectureTopicDescription
12Neural Radiance FieldsNeRF for creating 3D scenes from 2D images, volume rendering concepts.
13Diffusion ModelsPhysics-inspired learning, conditional image generation, DALL-E and Stable Diffusion.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.