3D PCA Visualization

This notebook contains an excerpt from the Python Data Science Handbook by Jake VanderPlas with additional 3D visualization examples. The text is released under the CC-BY-NC-ND license, and code is released under the MIT license.

Run in Google Colab

Open this tutorial in Google Colab to execute the code interactively.

This notebook demonstrates PCA using 3D data, showing how principal component analysis finds the directions of maximum variance and projects data onto lower-dimensional subspaces.

Setup

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import seaborn as sns; sns.set()

Generating 3D Gaussian Data

First, let’s create a 3D Gaussian distribution with a specific covariance structure:

# Define the covariance matrix
covariance_matrix = np.array([
    [4, 2, 1],
    [2, 3, 1.5],
    [1, 1.5, 2]
])

# Mean vector (assuming a mean of 0 for simplicity)
mean = np.zeros(3)

# Generate 1000 samples
samples = np.random.multivariate_normal(mean, covariance_matrix, 1000)

Visualizing 3D Data

fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')

# Scatter plot for samples
ax.scatter(samples[:, 0], samples[:, 1], samples[:, 2])

ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('Z axis')
ax.set_title('3D plot of Gaussian Samples')
plt.show()

Computing Principal Components via SVD

We can find the principal components by performing Singular Value Decomposition on the covariance matrix:

# Compute the covariance matrix of the samples
sample_covariance_matrix = np.cov(samples, rowvar=False)

# Perform SVD to find the principal components
U, S, Vt = np.linalg.svd(sample_covariance_matrix)

# Project the samples onto the first two principal components
projected_samples = np.dot(samples, U[:, :2])

2D Projection

plt.figure(figsize=(8, 6))
plt.scatter(projected_samples[:, 0], projected_samples[:, 1])
plt.xlabel('First Principal Component')
plt.ylabel('Second Principal Component')
plt.title('Projection onto the First Two Principal Components')
plt.grid(True)
plt.show()

Verifying Decorrelation

A key property of PCA is that the projected data has uncorrelated components:

# Calculate the correlation matrix of the projected samples
correlation_matrix = np.corrcoef(projected_samples.T)
print("Correlation matrix:\n", correlation_matrix)

# Extract the correlation between the first two components
correlation_between_components = correlation_matrix[0, 1]
print("Correlation between components:", correlation_between_components)
# Should be essentially zero (e.g., 2.97e-17)

Visualizing the Principal Plane in 3D

We can visualize how the 2D projection relates to the original 3D space:

# Mean of the original samples
mean_sample = np.mean(samples, axis=0)

# Extend the projected samples back to 3D space
extended_samples = np.dot(projected_samples, U[:, :2].T) + mean_sample

fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')

# Scatter plot for the extended samples in the principal component plane
ax.scatter(extended_samples[:, 0], extended_samples[:, 1],
           extended_samples[:, 2], alpha=0.6)

# Plot the principal axes
for i in range(2):
    axis = U[:, i] * max(S)
    ax.plot([mean_sample[0], mean_sample[0] + axis[0]],
            [mean_sample[1], mean_sample[1] + axis[1]],
            [mean_sample[2], mean_sample[2] + axis[2]],
            linewidth=2, color='r')

ax.set_xlabel('X axis')
ax.set_ylabel('Y axis')
ax.set_zlabel('Z axis')
ax.set_title('2D Projections in Original 3D Space')
plt.show()

Interactive 3D Visualization with Plotly

For interactive exploration, we can use Plotly:

import plotly.graph_objects as go

# Create scatter trace for the extended samples
scatter = go.Scatter3d(
    x=extended_samples[:, 0],
    y=extended_samples[:, 1],
    z=extended_samples[:, 2],
    mode='markers',
    marker=dict(size=2)
)

# Create traces for the principal axes
axes = []
for i in range(2):
    axis = U[:, i] * max(S)
    axes.append(go.Scatter3d(
        x=[mean_sample[0], mean_sample[0] + axis[0]],
        y=[mean_sample[1], mean_sample[1] + axis[1]],
        z=[mean_sample[2], mean_sample[2] + axis[2]],
        mode='lines',
        line=dict(color='red', width=4)
    ))

# Create a mesh for the projected plane
corner_points = np.array([
    mean_sample + U[:, 0] * max(S) + U[:, 1] * max(S),
    mean_sample - U[:, 0] * max(S) + U[:, 1] * max(S),
    mean_sample - U[:, 0] * max(S) - U[:, 1] * max(S),
    mean_sample + U[:, 0] * max(S) - U[:, 1] * max(S)
])

plane = go.Mesh3d(
    x=corner_points[:, 0],
    y=corner_points[:, 1],
    z=corner_points[:, 2],
    opacity=0.5,
    color='yellow'
)

# Layout
layout = go.Layout(
    title='2D Projections in Original 3D Space',
    scene=dict(
        xaxis_title='X axis',
        yaxis_title='Y axis',
        zaxis_title='Z axis'
    )
)

fig = go.Figure(data=[scatter, plane] + axes, layout=layout)
fig.show()

Using Scikit-Learn PCA

We can also use scikit-learn’s PCA implementation:

from sklearn.decomposition import PCA

# Create 2D data with correlation
rng = np.random.RandomState(1)
X = np.dot(rng.rand(2, 2), rng.randn(2, 200)).T

# Fit PCA
pca = PCA(n_components=2)
pca.fit(X)

print("Components:\n", pca.components_)
print("Explained variance:", pca.explained_variance_)

Visualizing Principal Axes

def draw_vector(v0, v1, ax=None, color='black'):
    ax = ax or plt.gca()
    arrowprops=dict(arrowstyle='->',
                    linewidth=2,
                    shrinkA=0, shrinkB=0, color=color)
    ax.annotate('', v1, v0, arrowprops=arrowprops)

# Plot data with principal axes
plt.scatter(X[:, 0], X[:, 1], alpha=0.2)
for length, vector in zip(pca.explained_variance_, pca.components_):
    v = vector * 3 * np.sqrt(length)
    draw_vector(pca.mean_, pca.mean_ + v)
plt.axis('equal');

Input vs Principal Components Comparison

rng = np.random.RandomState(1)
X = np.dot(rng.rand(2, 2), rng.randn(2, 200)).T
pca = PCA(n_components=2, whiten=True)
pca.fit(X)

fig, ax = plt.subplots(1, 2, figsize=(16, 6))
fig.subplots_adjust(left=0.0625, right=0.95, wspace=0.1)

# Plot original data
ax[0].scatter(X[:, 0], X[:, 1], alpha=0.2)
for length, vector in zip(pca.explained_variance_, pca.components_):
    v = vector * 3 * np.sqrt(length)
    draw_vector(pca.mean_, pca.mean_ + v, ax=ax[0])
ax[0].axis('equal')
ax[0].set(xlabel='x_1', ylabel='x_2', title='input')

# Plot principal components
X_pca = pca.transform(X)
ax[1].scatter(X_pca[:, 0], X_pca[:, 1], alpha=0.2)
draw_vector([0, 0], [0, 3], ax=ax[1])
draw_vector([0, 0], [3, 0], ax=ax[1])
ax[1].axis('equal')
ax[1].set(xlabel='component 1', ylabel='component 2',
          title='principal components',
          xlim=(-5, 5), ylim=(-3, 3.1))

Key Insights

Decorrelation: PCA transforms correlated variables into uncorrelated principal components
Variance Maximization: The first principal component captures the direction of maximum variance
Orthogonality: Principal components are orthogonal to each other
Dimensionality Reduction: We can project high-dimensional data onto a lower-dimensional subspace while preserving maximum variance

References

Python Data Science Handbook by Jake VanderPlas
Scikit-Learn PCA Documentation

Edit this page on GitHub or file an issue.

Foundations

Learning & Regression

Maximum Likelihood

Classification

Dimensionality Reduction

3D PCA Visualization

Run in Google Colab

Setup

Generating 3D Gaussian Data

Visualizing 3D Data

Computing Principal Components via SVD

2D Projection

Verifying Decorrelation

Visualizing the Principal Plane in 3D

Interactive 3D Visualization with Plotly

Using Scikit-Learn PCA

Visualizing Principal Axes

Input vs Principal Components Comparison

Key Insights

References

Foundations

Learning & Regression

Maximum Likelihood

Classification

Dimensionality Reduction

Run in Google Colab

​Setup

​Generating 3D Gaussian Data

​Visualizing 3D Data

​Computing Principal Components via SVD

​2D Projection

​Verifying Decorrelation

​Visualizing the Principal Plane in 3D

​Interactive 3D Visualization with Plotly

​Using Scikit-Learn PCA

​Visualizing Principal Axes

​Input vs Principal Components Comparison

​Key Insights

​References

Setup

Generating 3D Gaussian Data

Visualizing 3D Data

Computing Principal Components via SVD

2D Projection

Verifying Decorrelation

Visualizing the Principal Plane in 3D

Interactive 3D Visualization with Plotly

Using Scikit-Learn PCA

Visualizing Principal Axes

Input vs Principal Components Comparison

Key Insights

References