Before reading this section, ensure you are familiar with the Linear Algebra Annex. After reading this section, consider doing this assignment.
Part I: Covariance and Geometry
Covariance as shape
Let be a zero-mean random vector with covariance Eigenvectors of define orthogonal directions in space, and eigenvalues measure the variance along those directions. Geometrically, describes the shape of a probability ellipsoid.Latent variable construction
Instead of specifying directly, we construct it from a lower-dimensional latent variable: This implies The matrix determines the signal subspace; the noise term fills the remaining directions isotropically.This is the same construction used in classical factor analysis and probabilistic PCA.
Rank as a geometric constraint
Because , only directions can carry variance above the noise level. This explains why many high-dimensional datasets exhibit:- Rapidly decaying eigenvalue spectra
- Effective low dimensionality
Part II: Structured Coefficient Changes
A reference distribution
Fix a reference matrix with covariance This defines a baseline Gaussian distribution. All subsequent changes are measured relative to this geometry.Low-rank coefficient modifications
We now restrict changes to the form where:- is a matrix
- is an matrix
Diffuse coefficient changes
For comparison, consider dense changes with no preferred directions, scaled to match the Frobenius norm of the low-rank case. This contrast isolates the role of structure from magnitude.Observable consequences
Empirically, one finds:- Low-rank changes alter a small number of eigenvalues dramatically
- Diffuse changes alter many eigenvalues modestly
Rank limits the number of variance directions that can be modified.
Part III: Likelihood and Statistical Geometry
Empirical covariance
Given samples , the empirical covariance is a sufficient statistic for zero-mean Gaussian models.Gaussian likelihood geometry
The Gaussian negative log-likelihood is Geometric interpretation:- penalizes volume mismatch
- penalizes directional mismatch
You may recognize this as a Riemannian geometry on the cone of positive definite matrices.
Rank-constrained covariance matching
Expanding implies Thus a rank- coefficient change can only modify a limited number of eigen-directions, regardless of dimensionality. This fact explains the likelihood saturation observed in experiments.Part IV: Connection to LoRA in Transformers
LoRA (Low-Rank Adaptation) applies the same structural assumption to large linear operators:- A reference matrix is fixed (the pretrained weights)
- Adaptation is constrained to a low-rank subspace
- Rank controls expressive capacity
Final Takeaway
Low-rank adaptation is not an optimization trick. It is a geometric assumption about how complex systems change.- When that assumption holds, low-rank methods are statistically efficient
- When it does not, no algorithm can avoid higher-dimensional modification
References
- Dauphin, Y., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., et al. (2014). Identifying and attacking the saddle point problem in high-dimensional non-convex optimization.
- McInnes, L., Healy, J., Melville, J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction.
- Neumann, D., Wiese, T., Utschick, W. (2017). Learning the MMSE Channel Estimator.
- Pascal, F., Bombrun, L., Tourneret, J., Berthoumieu, Y. (2013). Parameter Estimation For Multivariate Generalized Gaussian Distributions.
- Sun, B., Feng, J., Saenko, K. (2016). Correlation Alignment for Unsupervised Domain Adaptation.

