Before reading this section, ensure you are familiar with the Linear Algebra Annex. After reading this section, consider doing this assignment.
Part I: Covariance and Geometry
Covariance as shape
Let be a zero-mean random vector with covariance Eigenvectors of define orthogonal directions in space, and eigenvalues measure the variance along those directions. Geometrically, describes the shape of a probability ellipsoid.Latent variable construction
Instead of specifying directly, we construct it from a lower-dimensional latent variable: This implies The matrix determines the signal subspace; the noise term fills the remaining directions isotropically.This is the same construction used in classical factor analysis and probabilistic PCA.
Rank as a geometric constraint
Because , only directions can carry variance above the noise level. This explains why many high-dimensional datasets exhibit:- Rapidly decaying eigenvalue spectra
- Effective low dimensionality
Part II: Structured Coefficient Changes
A reference distribution
Fix a reference matrix with covariance This defines a baseline Gaussian distribution. All subsequent changes are measured relative to this geometry.Low-rank coefficient modifications
We now restrict changes to the form where:- is a matrix
- is an matrix
Diffuse coefficient changes
For comparison, consider dense changes with no preferred directions, scaled to match the Frobenius norm of the low-rank case. This contrast isolates the role of structure from magnitude.Observable consequences
Empirically, one finds:- Low-rank changes alter a small number of eigenvalues dramatically
- Diffuse changes alter many eigenvalues modestly
Rank limits the number of variance directions that can be modified.
Part III: Likelihood and Statistical Geometry
Empirical covariance
Given samples , the empirical covariance is a sufficient statistic for zero-mean Gaussian models.Gaussian likelihood geometry
The Gaussian negative log-likelihood is Geometric interpretation:- penalizes volume mismatch
- penalizes directional mismatch
You may recognize this as a Riemannian geometry on the cone of positive definite matrices.
Rank-constrained covariance matching
Expanding implies Thus a rank- coefficient change can only modify a limited number of eigen-directions, regardless of dimensionality. This fact explains the likelihood saturation observed in experiments.Part IV: Connection to LoRA in Transformers
LoRA (Low-Rank Adaptation) applies the same structural assumption to large linear operators:- A reference matrix is fixed (the pretrained weights)
- Adaptation is constrained to a low-rank subspace
- Rank controls expressive capacity
Final Takeaway
Low-rank adaptation is not an optimization trick. It is a geometric assumption about how complex systems change.- When that assumption holds, low-rank methods are statistically efficient
- When it does not, no algorithm can avoid higher-dimensional modification
References
- Hu, E.J., et al. (2021). LoRA: Low-Rank Adaptation of Large Language Models
- Bishop, C.M. (2006). Pattern Recognition and Machine Learning, Chapter 12.2 (Probabilistic PCA)

