References
- Ahmed, Z., Le Roux, N., Norouzi, M., Schuurmans, D. (2018). Understanding the impact of entropy on policy optimization.
- Huang, S., Kanervisto, A., Raffin, A., Wang, W., Ontañón, S., et al. (2022). A2C is a special case of PPO.
- Mansour, Y., Singh, S. (2013). On the Complexity of Policy Iteration.
- Schulman, J., Levine, S., Moritz, P., Jordan, M., Abbeel, P. (2015). Trust Region Policy Optimization.

