This notebook contains an excerpt from the Python Data Science Handbook by Jake VanderPlas. The text is released under the CC-BY-NC-ND license, and code is released under the MIT license.
Run in Google Colab
Open this tutorial in Google Colab to execute the code interactively.
Introducing Principal Component Analysis
Principal component analysis is a fast and flexible unsupervised method for dimensionality reduction in data. Its behavior is easiest to visualize by looking at a two-dimensional dataset. Consider the following 200 points:PCA estimator, we can compute this as follows:
PCA as Dimensionality Reduction
Using PCA for dimensionality reduction involves zeroing out one or more of the smallest principal components, resulting in a lower-dimensional projection of the data that preserves the maximal data variance. Here is an example of using PCA as a dimensionality reduction transform:PCA for Visualization: Hand-written Digits
The usefulness of the dimensionality reduction may not be entirely apparent in only two dimensions, but becomes much more clear when looking at high-dimensional data. Let’s take a quick look at the application of PCA to the digits dataset.Choosing the Number of Components
A vital part of using PCA in practice is the ability to estimate how many components are needed to describe the data. This can be determined by looking at the cumulative explained variance ratio as a function of the number of components:PCA as Noise Filtering
PCA can also be used as a filtering approach for noisy data. The idea is this: any components with variance much larger than the effect of the noise should be relatively unaffected by the noise. So if you reconstruct the data using just the largest subset of principal components, you should be preferentially keeping the signal and throwing out the noise.Example: Eigenfaces
Let’s explore using PCA projection as a feature selector for facial recognition. We use the Labeled Faces in the Wild dataset:Summary
In this tutorial we have discussed the use of principal component analysis for:- Dimensionality reduction - Project high-dimensional data to lower dimensions
- Visualization - View high-dimensional data in 2D or 3D
- Noise filtering - Reconstruct data using only the largest principal components
- Feature selection - Extract the most important directions of variance
RandomizedPCA and SparsePCA.
Key references: (Anselmi et al., 2013; Bronstein et al., 2016; Sun et al., 2016)
References
- Anselmi, F., Leibo, J., Rosasco, L., Mutch, J., Tacchetti, A., et al. (2013). Unsupervised Learning of Invariant Representations in Hierarchical Architectures.
- Bronstein, M., Bruna, J., LeCun, Y., Szlam, A., Vandergheynst, P. (2016). Geometric deep learning: going beyond Euclidean data.
- Sun, B., Feng, J., Saenko, K. (2016). Correlation Alignment for Unsupervised Domain Adaptation.

