
Low Manhattan-bound traffic flow in Holland tunnel on Monday morning
An outcome carries information that is a function of the probability of this outcome by,
This can be intuitively understood when you compare two outcomes. For example, consider someone is producing the result of the vehicular traffic flow outside of Holland tunnel on Monday morning. The information that the results is “low” carries much more information when the result is “high” since most people expect that there will be horrendous traffic going into Manhattan on Monday mornings. When we want to represent the amount of uncertainty over a distribution (i.e. the traffic in Holland tunnel over all times) we can take the expectation over all possible outcomes i.e.
and we call this quantity the entropy of the probability distribution . When is continuous the entropy is known as differential entropy. Continuing the alphabetical example, we can determine the entropy over the distribution of letters in the sample text we met before as,


References
- Belghazi, M., Baratin, A., Rajeswar, S., Ozair, S., Bengio, Y., et al. (2018). MINE: Mutual Information Neural Estimation.
- Gao, W., Kannan, S., Oh, S., Viswanath, P. (2017). Estimating mutual information for discrete-continuous mixtures.
- Lapin, M., Hein, M., Schiele, B. (2013). Learning Using Privileged Information: SVM+ and Weighted SVM.
- Lord, W., Sun, J., Bollt, E. (2017). Geometric k-nearest neighbor estimation of entropy and mutual information.
- Shwartz-Ziv, R., Tishby, N. (2017). Opening the Black Box of Deep Neural Networks via Information.

