Variance and covariance
Variance measures how a single feature spreads around its mean; covariance measures how two features vary together. For a data matrix with rows (observations) and features in the columns, center each column and the covariance matrix is The diagonal holds the per-feature variances; the off-diagonal entry is the covariance between features and .Visualizing data and its covariance
A scatter plot shows the shape of a two-dimensional dataset, and a heatmap of its covariance matrix shows the same structure numerically: bright off-diagonal entries mean the two features move together.Simulating data
Start with two independent Gaussian features: the scatter is an axis-aligned cloud and the covariance is nearly diagonal. Then build a dependent pair, where one feature is a noisy copy of the other: the cloud tilts and the off-diagonal covariance grows.


Preprocessing: centering, standardization, and whitening
Centering subtracts the per-feature mean. Standardization additionally divides by the per-feature standard deviation, putting every feature on the same scale. Whitening goes further: it removes the correlations between features so that the covariance matrix becomes the identity. Whitening has three steps: center the data, rotate it onto the eigenvectors of the covariance matrix (which decorrelates it), then rescale each new axis by , where is the corresponding eigenvalue and is a small stabilizer.
Image whitening with ZCA
Whitening extends to images. Each image is a high-dimensional vector (here a 32 by 32 color image, so 3072 values). Zero-phase component analysis (ZCA) whitening decorrelates the pixel dimensions while keeping the result as close as possible to the original image, so the whitened picture still looks like the scene with its local structure emphasized. The ZCA transform is where and come from the singular value decomposition of the pixel covariance matrix.
References
- N. Pal and S. Sudeep, “Preprocessing for image classification by convolutional neural networks,” 2016.
- A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” 2009 (the CIFAR-10 dataset).
- See also the whitening lecture for how whitening relates to batch normalization, and the Gaussians page for the distribution this section preprocesses.

