Computing the value functions given a policy
In this section we describe how to calculate the value functions by establishing a recursive relationship similar to the one we did for the return. We replace the expectations with summations over quantities such as states and actions. Lets start with the state-value function that can be written as,All above expectations are with respect to policy . In addition there is some missing details in the above derivation.
- The immediate reward,
- The discounted value of the successor state .



Bellman State-Action Value Expectation Equation

Bellman State Value Expectation Equation
In the compact form (2nd line of the equation) we denote:

