
What you’ll learn
This book bridges multiple disciplines to provide a unified understanding of AI agents:Foundations
Statistical learning theory, regression, classification, and optimization fundamentals.
Neural Networks
Backpropagation, normalization, regularization, and training techniques.
Perception
CNNs, sensor models, object detection, and segmentation.
LLMs
NLP foundations, transformers, and large language models.
Reasoning
Logical reasoning and LLM-based reasoning.
VLMs
Vision-language models including CLIP, LLaVA, and BLIP-2.
Planning
Task planning, global planning, and local planning for autonomous navigation.
MDPs & RL
Markov decision processes, Bellman equations, and reinforcement learning.
Robotics Systems
Kinematics, state estimation, SLAM, and systems integration.
Physical AI
Vision-Language-Action agents for embodied intelligence.
How to use this book
The content is organized into two tracks that share key chapters:- AI/ML track — Foundations → Neural Networks → Perception → LLMs → Reasoning → VLMs → Planning → MDPs → RL
- Robotics track — Perception → Robotics Systems → Physical AI
Table of contents
| Part | Section | Topics |
|---|---|---|
| Foundations | Learning & Regression | Learning problem, linear regression, empirical risk, SGD |
| Maximum Likelihood | Entropy, marginal MLE, Gaussian MLE, conditional MLE | |
| Classification | Classification intro, perceptron, logistic regression | |
| Dimensionality Reduction | PCA, PCA workshop, 3D PCA, low-rank Gaussians | |
| Neural Networks | Backpropagation | DNN intro, backprop intro, backprop DNNs, exercises, Fashion MNIST |
| Whitening | Whitening, correlation-covariance matrix | |
| Normalization | Batch normalization, layer normalization | |
| Regularization | Regularization techniques | |
| Hyperparameter Optimization | Bayesian optimization, HPO workshop | |
| Transfer Learning | Introduction, tutorial | |
| Perception | Sensor Models | Camera models, pinhole model, calibration, beam models, likelihood field |
| CNNs | CNN intro, layers, architectures, small datasets, visualization, ResNet features | |
| Scene Understanding | Introduction, detection metrics | |
| Faster RCNN Lab | RCNN → Fast RCNN → Faster RCNN, 6-notebook PyTorch series | |
| YOLO Lab | YOLO introduction, 5-notebook PyTorch series | |
| UNet Lab | UNet architecture, from-scratch notebook | |
| Mask RCNN Lab | Mask RCNN, TF demos, PyTorch Detectron2 | |
| LLMs | NLP Foundations | NLP pipelines, Word2Vec |
| Recurrent Neural Networks | Introduction, simple RNN, LSTM | |
| Language Models | Language models, RNN language model | |
| Neural Machine Translation | NMT intro, RNN NMT with attention | |
| Transformers | Introduction, single-head attention, multi-head attention, MLP, inference | |
| Speech Agents | Text-to-speech and voice cloning | |
| Reasoning | Logical Reasoning | Propositional logic, logical inference, logical agents, applications |
| LLM Reasoning | LLM-based reasoning approaches | |
| VLMs | Vision-Language Models | Overview, CLIP, LLaVA, BLIP-2 |
| Planning | Task Planning | PDDL, BlocksWorld, logistics, manufacturing |
| Global Planning | Search, forward search, A* | |
| Local Planning | Motion planning, behavioral planning, prediction | |
| MDPs | Markov Decision Processes | MDP introduction |
| Bellman Equations | Expectation backup, optimality backup, policy improvement, recycling robot | |
| Dynamic Programming | Policy iteration | |
| RL | Reinforcement Learning | Introduction, model-based algorithms |
| Prediction | Monte Carlo, temporal difference, TD vs MC | |
| Control | Generalized policy iteration, greedy MC, SARSA, gridworld | |
| Policy-Based | REINFORCE | |
| Robotics Systems | Kinematics & Dynamics | Configuration space, homogeneous coordinates, motion representations, wheeled robots |
| State Estimation | Recursive estimation, discrete Bayesian filter, Kalman filters, HMM localization | |
| SLAM | Occupancy mapping, simultaneous localization and mapping | |
| Systems Integration | Gazebo simulation, ROS applications, Sim2Real, imitation learning | |
| Physical AI | VLA Models | Vision-Language-Action agents |

