Skip to main content

From a storm to generative AI

The velocity-fields page showed how a 2D wind pattern transports air through space. The same mathematical object appears in generative modeling:
Weather. v(x,y)\mathbf{v}(x, y) tells air where to move. Diffusion and flow matching. v(x,t)\mathbf{v}(x, t) tells probability mass, or a generated sample, where to move.
In weather you do not get to choose the wind: the atmosphere imposes it. The leaves, raindrops, and dust that the storm carries are passive passengers, each one sampling the wind field at its current position and drifting accordingly. In generative modeling the roles flip: you design the wind. You choose a velocity field whose particles, released from Gaussian noise, end up distributed like real data. The particles in flow matching play the role of the leaves: each particle is a sample carried by the learned field vθv_\theta, starting from a Gaussian draw and ending at a generated data point. One subtlety worth pinning down up front: only the starting positions x0x_0 are Gaussian. Once the field begins to advect them, the cloud of particles is no longer Gaussian. At intermediate time tt the particles are samples from the deformed distribution ptp_t, and by t=1t = 1 they are samples from a distribution that approximates the data. The randomness lives entirely in the initial draw; after that, the trajectory of each particle is determined by the ODE. That is the setup of flow matching: a generative modeling framework that learns a continuous-time vector field transporting samples from a simple source distribution to a target data distribution. It generalizes diffusion models: where diffusion fixes a stochastic forward process and learns to reverse it, flow matching directly parameterizes the deterministic ODE that connects noise to data, and is trained with a simple regression objective on conditional vector fields. Flow matching has emerged as the production-grade choice in several recent generative systems across images, video, audio, speech, and molecular structures, including Meta’s Movie Gen, Stable Diffusion 3, and Flux.

Definition

A time-dependent velocity field is a function v:Rd×[0,1]Rd,(x,t)v(x,t)v: \mathbb{R}^d \times [0, 1] \to \mathbb{R}^d, \qquad (x, t) \mapsto v(x, t) that assigns a velocity vector v(x,t)Rdv(x, t) \in \mathbb{R}^d to every point xx at every time tt. A particle whose trajectory xtRdx_t \in \mathbb{R}^d is driven by this field obeys the ordinary differential equation dxtdt  =  v(xt,t),x0p0(x).\frac{d x_t}{d t} \;=\; v(x_t, t), \qquad x_0 \sim p_0(x). Given an initial sample x0x_0 from a source distribution p0p_0 (typically a standard Gaussian), the ODE produces a unique trajectory {xt}t[0,1]\{x_t\}_{t \in [0, 1]}. The map ϕt(x0)  =  xt\phi_t(x_0) \;=\; x_t is called the flow induced by vv.

What the field does to a distribution

The flow does more than move individual particles; it transports the entire source density p0p_0 forward in time. At each tt, the pushforward pt  =  (ϕt)p0p_t \;=\; (\phi_t)_\sharp \, p_0 is a probability density on Rd\mathbb{R}^d that describes where the swarm of particles is at time tt. If the velocity field is chosen well, then at t=1t = 1 the pushforward matches the data distribution: p1    pdata.p_1 \;\approx\; p_{\text{data}}. The pair (v,pt)(v, p_t) is linked by the continuity equation: ptt+(ptv)=0.\frac{\partial p_t}{\partial t} + \nabla \cdot \bigl( p_t \, v \bigr) = 0. This is the same equation that describes mass conservation in a fluid. Read geometrically: the local rate of change of density is determined by the divergence of the mass flux ptvp_t v. Designing a velocity field is, in effect, designing a fluid flow that morphs noise into data.

Generation as trajectory integration

Once you have a learned velocity field vθv_\theta that approximates a true transport field, sampling from the model amounts to numerical ODE integration. The simplest scheme is Euler integration with step size Δt\Delta t: xt+Δt  =  xt+Δtvθ(xt,t).x_{t + \Delta t} \;=\; x_t + \Delta t \cdot v_\theta(x_t, t). This is the same loop you ran on the storm: at each step, evaluate the field at the current position and take a small step in that direction. Starting from x0N(0,I)x_0 \sim \mathcal{N}(0, I) and stepping until t=1t = 1, you obtain a sample x1x_1 that is approximately distributed according to pdatap_{\text{data}}. Higher-order solvers (Heun, RK4, adaptive Dormand-Prince) integrate the same field with fewer steps and lower truncation error. The choice of solver is decoupled from the choice of velocity field, which is a practical advantage of the flow-matching formulation.

Three views of the same object

A velocity field shows three complementary faces:
  1. Vector field view. At each (x,t)(x, t), draw the arrow v(x,t)v(x, t). Quiver plots are this view.
  2. Streamline view. Fix tt and trace integral curves of v(,t)v(\cdot, t).
  3. Particle view. Release a cloud of particles at t=0t = 0 and follow them as they advect under vv. Trajectory plots are this view.
The same machinery underlies fluid mechanics, dynamical systems, and the optical-flow problem in computer vision. Flow matching borrows the geometry and applies it to generative modeling: the model learns how probability mass flows through space and time.

What gets learned

The network is not asked to denoise or to predict a discrete sequence of tokens. It is asked to regress one scalar-valued function per output dimension: vθ(x,t)    v(x,t)v_\theta(x, t) \;\approx\; v^\star(x, t) where vv^\star is a target velocity field induced by a chosen probability path between p0p_0 and pdatap_{\text{data}}. Constructing that target, picking the path and writing down the corresponding velocity, is the heart of the flow-matching training objective and is covered in the Lipman et al. references below.

Pointers