Foundations of CV - aegean.ai

This project builds a set of supplemental tutorial sections for Foundations of Computer Vision by Antonio Torralba, Phillip Isola, and William T. Freeman. The book explains almost every idea with a figure or a plot. Your job is to reproduce those figures as runnable code, so each concept becomes something you can execute, change, and inspect for yourself.

The task

Pick a chapter from the list below and write a tutorial section that reproduces its key figures. Treat the book as the specification: for every plot the chapter uses to make a point, write the code that regenerates it from scratch and explain the concept the figure illustrates. The result is a companion page that sits next to the book and moves a reader from “read the figure” to “run the figure”.

What a finished section looks like

It reproduces the chapter’s main figures with runnable code.
It states the concept each figure demonstrates and the math behind it.
It runs end to end, so you can change a parameter and watch the plot update.

Follow the conventions already used across this site:

One chapter per section, each in its own directory. Reserve that directory’s images/ folder for block diagram artifacts only: the diagram source (Mermaid or C4) together with its rendered .png. Notebook output figures (the plots your code generates) are embedded in the page directly and do not need to be stored under images/.
Markdown cells and inline comments never discuss the plotting library or the toolchain (matplotlib, headless rendering, install steps, and the like). See the coding guidelines below for what they should emphasize.
Pure-plotting cells are tagged so the page shows the figure but hides the drawing code.
The prose addresses you directly and stays evergreen.

Coding guidelines

Write all computational code in PyTorch and libraries built on the PyTorch foundation, such as Kornia for differentiable computer vision. Express the math each figure illustrates with tensors and these libraries rather than NumPy or framework-specific equivalents, so every section runs the same way on CPU or GPU.
Markdown cells and inline comments explain the code first and the concept second: describe what each block does, then connect it back to the idea the figure illustrates.

Where your work goes

Submit your work to the pantelis/eng-ai-agents repository through a fork and pull request.

Fork pantelis/eng-ai-agents to your own account and clone your fork.
Branch. Create one branch per section, named mit-book-chapter-<chapter>-<section>. For chapter 12, section 5, the branch is mit-book-chapter-12-5.
Location. Place every notebook under the notebooks/ folder.
Pull request. Push the branch to your fork and open a pull request against main of pantelis/eng-ai-agents.

Chapters to treat

The chapters below are grouped by the parts of the book. There are 28 chapters in scope. The Assignee column records who has claimed each chapter; all 28 are currently claimed. Any chapter shown as Open is still available, so add your name in the course channel to take it.

Image formation

Chapter	Title	Assignee	Status
5	Imaging	Ishan Tandon	Claimed
6	Lenses	Kimberly Milner	Claimed
7	Cameras as linear systems	Ishan Tandon	Claimed
8	Color	Ishan Tandon	Claimed

Foundations of learning

Chapter	Title	Assignee	Status
13	Neural networks as distribution transformers	Kimberly Milner	Claimed

Image processing

Chapter	Title	Assignee	Status
15	Linear image filtering	Kimberly Milner	Claimed
16	Fourier analysis	Kimberly Milner	Claimed

Sampling and multiscale image representations

Chapter	Title	Assignee	Status
20	Image sampling and aliasing	Nazib Khan	Claimed
21	Downsampling and upsampling images	Nazib Khan	Claimed
22	Filter banks	Nazib Khan	Claimed
23	Image pyramids	Nazib Khan	Claimed

Neural architectures for vision

Chapter	Title	Assignee	Status
24	Convolutional neural nets	Kimberly Milner	Claimed
26	Transformers	Kimberly Milner	Claimed

Generative image models and representation learning

Chapter	Title	Assignee	Status
30	Representation learning	Shaury Pratap Singh (Nazib Khan contributing)	Claimed
34	Conditional generative models	Shaury Pratap Singh	Claimed

Understanding geometry

Chapter	Title	Assignee	Status
38	Representing images and geometry	Kaushik Kachireddy	Claimed
39	Camera modeling and calibration	Ruimeng Yang	Claimed
40	Stereo vision	Ruimeng Yang	Claimed
41	Homographies	Ruimeng Yang	Claimed
42	Single view metrology	Kaushik Kachireddy	Claimed
43	Learning to estimate depth from a single image	Ruimeng Yang	Claimed
44	Multiview geometry and structure from motion	Shaury Pratap Singh	Claimed
45	Radiance fields	Ruimeng Yang	Claimed

Understanding motion

Chapter	Title	Assignee	Status
46	Motion estimation	Ruimeng Yang	Claimed
47	3D motion and its 2D projection	Kaushik Kachireddy	Claimed
48	Optical flow estimation	Shaury Pratap Singh	Claimed
49	Learning to estimate motion	Shaury Pratap Singh	Claimed

Understanding vision with language

Chapter	Title	Assignee	Status
51	Vision and language	Shaury Pratap Singh (Nazib Khan contributing)	Claimed

Reference

Foundations of Computer Vision, Antonio Torralba, Phillip Isola, and William T. Freeman, MIT Press. Read it online at visionbook.mit.edu.

Edit this page on GitHub or file an issue.

​The task

​What a finished section looks like

​Coding guidelines

​Where your work goes

​Chapters to treat

​Image formation

​Foundations of learning

​Image processing

​Sampling and multiscale image representations

​Neural architectures for vision

​Generative image models and representation learning

​Understanding geometry

​Understanding motion

​Understanding vision with language

​Reference

The task

What a finished section looks like

Coding guidelines

Where your work goes

Chapters to treat

Image formation

Foundations of learning

Image processing

Sampling and multiscale image representations

Neural architectures for vision

Generative image models and representation learning

Understanding geometry

Understanding motion

Understanding vision with language

Reference