Low-Rank Structure and Gaussian Geometry

Before reading this section, ensure you are familiar with the Linear Algebra Annex. After reading this section, consider doing this assignment.

Assignment

Explore low-rank Gaussian geometry with guided exercises

Large linear operators appear throughout modern machine learning systems. Yet empirical evidence shows that meaningful changes to such systems often occupy only a small number of directions. This tutorial develops that idea from first principles, using Gaussian distributions to isolate the geometry of low-rank change before connecting it to modern architectures like LoRA (Low-Rank Adaptation).

Part I: Covariance and Geometry

Covariance as shape

Let

x \in \mathbb{R}^d

be a zero-mean random vector with covariance

\Sigma = \mathbb{E}[x x^\top].

Eigenvectors of

\Sigma

define orthogonal directions in space, and eigenvalues measure the variance along those directions. Geometrically,

\Sigma

describes the shape of a probability ellipsoid.

Internalize that eigenvalues are not abstract quantities - they directly correspond to observable spread in data.

Latent variable construction

Instead of specifying

\Sigma

directly, we construct it from a lower-dimensional latent variable:

z \sim \mathcal{N}(0, I_k), \quad x = W z + \varepsilon, \quad \varepsilon \sim \mathcal{N}(0, \sigma^2 I_d).

This implies

\Sigma = W W^\top + \sigma^2 I_d.

The matrix

W

determines the signal subspace; the noise term fills the remaining directions isotropically.

This is the same construction used in classical factor analysis and probabilistic PCA.

Rank as a geometric constraint

Because

\mathrm{rank}(W W^\top) \le k

, only

k

directions can carry variance above the noise level. This explains why many high-dimensional datasets exhibit:

Rapidly decaying eigenvalue spectra
Effective low dimensionality

Part II: Structured Coefficient Changes

A reference distribution

Fix a reference matrix

W_0

with covariance

\Sigma_0 = W_0 W_0^\top + \sigma^2 I_d.

This defines a baseline Gaussian distribution. All subsequent changes are measured relative to this geometry.

Low-rank coefficient modifications

We now restrict changes to the form

\Delta W = B A,

where:

$B$ is a $d \times r$ matrix
$A$ is an $r \times k$ matrix
$r \ll \min(d,k)$

Only

r

new directions can be introduced.

Low-rank structure does not limit how much the matrix changes, but where it can change.

Diffuse coefficient changes

For comparison, consider dense changes with no preferred directions, scaled to match the Frobenius norm of the low-rank case. This contrast isolates the role of structure from magnitude.

Observable consequences

Empirically, one finds:

Low-rank changes alter a small number of eigenvalues dramatically
Diffuse changes alter many eigenvalues modestly

This establishes the central principle:

Rank limits the number of variance directions that can be modified.

Part III: Likelihood and Statistical Geometry

Empirical covariance

Given samples

\{x_i\}_{i=1}^n

, the empirical covariance

S = \frac{1}{n} \sum_{i=1}^n x_i x_i^\top

is a sufficient statistic for zero-mean Gaussian models.

Gaussian likelihood geometry

The Gaussian negative log-likelihood is

\mathcal{L}(\Sigma) = \frac{1}{2} \left[ \log \det \Sigma + \mathrm{tr}\!\left(\Sigma^{-1} S\right) \right].

Geometric interpretation:

$\log \det \Sigma$ penalizes volume mismatch
$\mathrm{tr}(\Sigma^{-1} S)$ penalizes directional mismatch

You may recognize this as a Riemannian geometry on the cone of positive definite matrices.

Rank-constrained covariance matching

Expanding

\Sigma - \Sigma_0 = W_0 \Delta W^\top + \Delta W W_0^\top + \Delta W \Delta W^\top

implies

\mathrm{rank}(\Sigma - \Sigma_0) \le 2r.

Thus a rank-

r

coefficient change can only modify a limited number of eigen-directions, regardless of dimensionality. This fact explains the likelihood saturation observed in experiments.

Part IV: Connection to LoRA in Transformers

LoRA (Low-Rank Adaptation) applies the same structural assumption to large linear operators:

A reference matrix is fixed (the pretrained weights)
Adaptation is constrained to a low-rank subspace
Rank controls expressive capacity

Transformers obscure this geometry with nonlinearities and attention mechanisms. The Gaussian setting exposes it directly.

# LoRA structure in practice
# Original: y = Wx
# LoRA:     y = Wx + BAx  where B ∈ R^{d×r}, A ∈ R^{r×k}

class LoRALayer:
    def __init__(self, W, rank):
        self.W = W  # frozen pretrained weights
        self.B = initialize_zeros(W.shape[0], rank)
        self.A = initialize_random(rank, W.shape[1])

    def forward(self, x):
        return self.W @ x + self.B @ (self.A @ x)

Final Takeaway

Low-rank adaptation is not an optimization trick. It is a geometric assumption about how complex systems change.

When that assumption holds, low-rank methods are statistically efficient
When it does not, no algorithm can avoid higher-dimensional modification

References

Hu, E.J., et al. (2021). LoRA: Low-Rank Adaptation of Large Language Models
Bishop, C.M. (2006). Pattern Recognition and Machine Learning, Chapter 12.2 (Probabilistic PCA)

Edit this page on GitHub or file an issue.

Foundations

Learning & Regression

Maximum Likelihood

Classification

Dimensionality Reduction

Low-Rank Structure and Gaussian Geometry

Assignment

Part I: Covariance and Geometry

Covariance as shape

Latent variable construction

Rank as a geometric constraint

Part II: Structured Coefficient Changes

A reference distribution

Low-rank coefficient modifications

Diffuse coefficient changes

Observable consequences

Part III: Likelihood and Statistical Geometry

Empirical covariance

Gaussian likelihood geometry

Rank-constrained covariance matching

Part IV: Connection to LoRA in Transformers

Final Takeaway

References

Foundations

Learning & Regression

Maximum Likelihood

Classification

Dimensionality Reduction

Assignment

​Part I: Covariance and Geometry

​Covariance as shape

​Latent variable construction

​Rank as a geometric constraint

​Part II: Structured Coefficient Changes

​A reference distribution

​Low-rank coefficient modifications

​Diffuse coefficient changes

​Observable consequences

​Part III: Likelihood and Statistical Geometry

​Empirical covariance

​Gaussian likelihood geometry

​Rank-constrained covariance matching

​Part IV: Connection to LoRA in Transformers

​Final Takeaway

​References

Part I: Covariance and Geometry

Covariance as shape

Latent variable construction

Rank as a geometric constraint

Part II: Structured Coefficient Changes

A reference distribution

Low-rank coefficient modifications

Diffuse coefficient changes

Observable consequences

Part III: Likelihood and Statistical Geometry

Empirical covariance

Gaussian likelihood geometry

Rank-constrained covariance matching

Part IV: Connection to LoRA in Transformers

Final Takeaway

References