Multi-variate Gaussian Distribution

Multi-variate Gaussian distribution

Perhaps the only distribution that is worth knowing and remembering its form is the multivariate Normal as it has widespread applicability in data science.

f_{\mathbf X}(x_1,\ldots,x_k) = \frac{\exp\left(-\frac 1 2 ({\mathbf x}-{\boldsymbol\mu})^\mathrm{T}{\boldsymbol\Sigma}^{-1}({\mathbf x}-{\boldsymbol\mu})\right)}{\sqrt{(2\pi)^n|\boldsymbol\Sigma|}}

where

\mathbf{x}

is a real

n

-dimensional column vector and

|\boldsymbol{\Sigma}| \equiv \operatorname{det}\boldsymbol{\Sigma}

is the determinant of

\boldsymbol{\Sigma}

. You can generate correlated Gaussian distributions from white gaussian random variables as shown below. This can be useful in multiple settings. Foe example you can synthesize training examples where there is correlation between features. You can also explain what happens in successive layers of a neural network when a correlated input is propagated through.

Correlated Gaussians

Generating correlated Gaussian random variables

The following code simulates and plots a bivariate normal distribution using the parameters from Example 6.6 of the Math for ML book. The covariance matrix introduces correlation between the two variables, and we visualize both the 3D surface and contour plots of the resulting density.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal

plt.style.use('seaborn-v0_8-dark')
plt.rcParams['figure.figsize'] = 14, 6
fig = plt.figure()

# Initializing the random seed
random_seed = 1000

# Covariance values to iterate over
cov_val = [-1]

# Setting mean of the distribution to be at (0, 2)
mean = np.array([0, 2])

# Storing density function values for further analysis
pdf_list = []

for idx, val in enumerate(cov_val):

    # Initializing the covariance matrix
    cov = np.array([[0.3, val], [val, 5]])

    # Generating a Gaussian bivariate distribution
    # with given mean and covariance matrix
    distr = multivariate_normal(cov=cov, mean=mean, seed=random_seed)

    # Generating a meshgrid complacent with the 3-sigma boundary
    mean_1, mean_2 = mean[0], mean[1]
    sigma_1, sigma_2 = cov[0, 0], cov[1, 1]

    x = np.linspace(-4 * sigma_1, 4 * sigma_1, num=100)
    y = np.linspace(-1 * sigma_2, 1.75 * sigma_2, num=100)
    X, Y = np.meshgrid(x, y)

    # Generating the density function for each point in the meshgrid
    pdf = np.zeros(X.shape)
    for i in range(X.shape[0]):
        for j in range(X.shape[1]):
            pdf[i, j] = distr.pdf([X[i, j], Y[i, j]])

    # Plotting the density function values
    key = 131 + idx
    ax = fig.add_subplot(key, projection='3d')
    ax.plot_surface(X, Y, pdf, cmap='viridis')
    plt.xlabel("x1")
    plt.ylabel("x2")
    plt.title(f'Covariance between x1 and x2 = {val}')
    pdf_list.append(pdf)
    ax.axes.zaxis.set_ticks([])

plt.tight_layout()
plt.show()

# Plotting contour plots
for idx, val in enumerate(pdf_list):
    plt.subplot(1, 3, idx + 1)
    plt.contourf(X, Y, val, cmap='viridis')
    plt.xlabel("x1")
    plt.ylabel("x2")
    plt.title(f'Covariance between x1 and x2 = {cov_val[idx]}')
plt.tight_layout()
plt.show()

Key references: (Wilson et al., 2011; Pascal et al., 2013; Raissi et al., 2017; Tran et al., 2015)

References

Pascal, F., Bombrun, L., Tourneret, J., Berthoumieu, Y. (2013). Parameter Estimation For Multivariate Generalized Gaussian Distributions.
Raissi, M., Perdikaris, P., Karniadakis, G. (2017). Numerical Gaussian Processes for Time-dependent and Non-linear Partial Differential Equations.
Tran, D., Ranganath, R., Blei, D. (2015). The Variational Gaussian Process.
Wilson, A., Knowles, D., Ghahramani, Z. (2011). Gaussian Process Regression Networks.

Edit this page on GitHub or file an issue.

Prerequisites

Probability

Linear Algebra

Calculus

Programming

Multi-variate Gaussian Distribution