Markov Decision Processes

We started looking at different agent behavior architectures starting from the planning agents where the model of the environment is known and with no interaction with it, the agent improves its policy, using this model as well as problem solving and logical reasoning skills. We now look at agents that can plan by:

Interacting with the environment by receiving reward signals from it during each interaction.
Knowing the model (dynamics) of the environment, they have an internal objective function that they try to optimize based on the experience they accumulate.

The problem as will see, will be described via a set of four equations called Bellman expectation and Bellman optimality equations that connect the values (utility) with each state or action with the policy (strategy) of the agent. These equations can be solved by Dynamic Programming algorithms to produce the optimal policy that the agent must adopt. Computationally we will go through approaches that solve the MDP as efficiently as possible - namely, the value and policy iteration algorithms.

Topics

MDP Introduction

States, actions, transitions, rewards, and value functions.

Bellman Expectation

Computing value functions using Bellman expectation equations.

Bellman Optimality

Optimal value functions and the Bellman optimality equations.

Policy Iteration

Finding optimal policies through evaluation and improvement.

Resources

Apart from the notes here that are largely based on David Silver’s (Deep Mind) course material and video lectures, you can consult these additional resources:

Sutton & Barto’s Reinforcement Learning Book - David Silver’s slides and video lectures are based on this book. The code in Python is here.
Deep Reinforcement Learning in Python - written by Google researchers.

Many of the algorithms presented here like policy and value iteration have been developed in older repos such as rlcode and dennybritz. This site is being migrated to be compatible with Farama and their Gymnasium tooling.

Edit this page on GitHub or file an issue.

Markov Decision Processes

Bellman Equations

Dynamic Programming

​Topics

MDP Introduction

Bellman Expectation

Bellman Optimality

Policy Iteration

​Resources

Topics

Resources