Skip to main content
Engineering AI Agents Book Cover
This book is a work in progress. Some chapters are incomplete or in draft form. Content is being actively developed and may change.
Welcome to Engineering AI Agents. This comprehensive resource covers the foundations and advanced topics needed to build intelligent autonomous systems.

What you’ll learn

This book bridges multiple disciplines to provide a unified understanding of AI agents:

Foundations

Statistical learning theory, regression, classification, and optimization fundamentals.

Neural Networks

Backpropagation, normalization, regularization, and training techniques.

Perception

CNNs, sensor models, object detection, and segmentation.

LLMs

NLP foundations, transformers, and large language models.

Reasoning

Logical reasoning and LLM-based reasoning.

VLMs

Vision-language models including CLIP, LLaVA, and BLIP-2.

Planning

Task planning, global planning, and local planning for autonomous navigation.

MDPs & RL

Markov decision processes, Bellman equations, and reinforcement learning.

Robotics Systems

Kinematics, state estimation, SLAM, and systems integration.

Physical AI

Vision-Language-Action agents for embodied intelligence.

How to use this book

The content is organized into two tracks that share key chapters:
  • AI/ML track — Foundations → Neural Networks → Perception → LLMs → Reasoning → VLMs → Planning → MDPs → RL
  • Robotics track — Perception → Robotics Systems → Physical AI
Each section contains groups of related lectures that build upon each other. We recommend following the sequence within each track, though experienced readers may jump to specific topics. The book contains Python notebooks and code snippets for hands-on experience. The notebooks are available in the GitHub repository, and results from notebook runs are logged in the Weights & Biases workspace.

Table of contents

PartSectionTopics
FoundationsLearning & RegressionLearning problem, linear regression, empirical risk, SGD
Maximum LikelihoodEntropy, marginal MLE, Gaussian MLE, conditional MLE
ClassificationClassification intro, perceptron, logistic regression
Dimensionality ReductionPCA, PCA workshop, 3D PCA, low-rank Gaussians
Neural NetworksBackpropagationDNN intro, backprop intro, backprop DNNs, exercises, Fashion MNIST
WhiteningWhitening, correlation-covariance matrix
NormalizationBatch normalization, layer normalization
RegularizationRegularization techniques
Hyperparameter OptimizationBayesian optimization, HPO workshop
Transfer LearningIntroduction, tutorial
PerceptionSensor ModelsCamera models, pinhole model, calibration, beam models, likelihood field
CNNsCNN intro, layers, architectures, small datasets, visualization, ResNet features
Scene UnderstandingIntroduction, detection metrics
Faster RCNN LabRCNN → Fast RCNN → Faster RCNN, 6-notebook PyTorch series
YOLO LabYOLO introduction, 5-notebook PyTorch series
UNet LabUNet architecture, from-scratch notebook
Mask RCNN LabMask RCNN, TF demos, PyTorch Detectron2
LLMsNLP FoundationsNLP pipelines, Word2Vec
Recurrent Neural NetworksIntroduction, simple RNN, LSTM
Language ModelsLanguage models, RNN language model
Neural Machine TranslationNMT intro, RNN NMT with attention
TransformersIntroduction, single-head attention, multi-head attention, MLP, inference
Speech AgentsText-to-speech and voice cloning
ReasoningLogical ReasoningPropositional logic, logical inference, logical agents, applications
LLM ReasoningLLM-based reasoning approaches
VLMsVision-Language ModelsOverview, CLIP, LLaVA, BLIP-2
PlanningTask PlanningPDDL, BlocksWorld, logistics, manufacturing
Global PlanningSearch, forward search, A*
Local PlanningMotion planning, behavioral planning, prediction
MDPsMarkov Decision ProcessesMDP introduction
Bellman EquationsExpectation backup, optimality backup, policy improvement, recycling robot
Dynamic ProgrammingPolicy iteration
RLReinforcement LearningIntroduction, model-based algorithms
PredictionMonte Carlo, temporal difference, TD vs MC
ControlGeneralized policy iteration, greedy MC, SARSA, gridworld
Policy-BasedREINFORCE
Robotics SystemsKinematics & DynamicsConfiguration space, homogeneous coordinates, motion representations, wheeled robots
State EstimationRecursive estimation, discrete Bayesian filter, Kalman filters, HMM localization
SLAMOccupancy mapping, simultaneous localization and mapping
Systems IntegrationGazebo simulation, ROS applications, Sim2Real, imitation learning
Physical AIVLA ModelsVision-Language-Action agents