Flow matching

From a storm to generative AI

The velocity-fields page showed how a 2D wind pattern transports air through space. The same mathematical object appears in generative modeling:

Weather. $\mathbf{v}(x, y)$ tells air where to move. Diffusion and flow matching. $\mathbf{v}(x, t)$ tells probability mass, or a generated sample, where to move.

In weather you do not get to choose the wind: the atmosphere imposes it. The leaves, raindrops, and dust that the storm carries are passive passengers, each one sampling the wind field at its current position and drifting accordingly. In generative modeling the roles flip: you design the wind. You choose a velocity field whose particles, released from Gaussian noise, end up distributed like real data. The particles in flow matching play the role of the leaves: each particle is a sample carried by the learned field

v_\theta

, starting from a Gaussian draw and ending at a generated data point. One subtlety worth pinning down up front: only the starting positions

x_0

are Gaussian. Once the field begins to advect them, the cloud of particles is no longer Gaussian. At intermediate time

t

the particles are samples from the deformed distribution

p_t

, and by

t = 1

they are samples from a distribution that approximates the data. The randomness lives entirely in the initial draw; after that, the trajectory of each particle is determined by the ODE. That is the setup of flow matching: a generative modeling framework that learns a continuous-time vector field transporting samples from a simple source distribution to a target data distribution. It generalizes diffusion models: where diffusion fixes a stochastic forward process and learns to reverse it, flow matching directly parameterizes the deterministic ODE that connects noise to data, and is trained with a simple regression objective on conditional vector fields. Flow matching has emerged as the production-grade choice in several recent generative systems across images, video, audio, speech, and molecular structures, including Meta’s Movie Gen, Stable Diffusion 3, and Flux.

Definition

A time-dependent velocity field is a function

v: \mathbb{R}^d \times [0, 1] \to \mathbb{R}^d, \qquad (x, t) \mapsto v(x, t)

that assigns a velocity vector

v(x, t) \in \mathbb{R}^d

to every point

x

at every time

t

. A particle whose trajectory

x_t \in \mathbb{R}^d

is driven by this field obeys the ordinary differential equation

\frac{d x_t}{d t} \;=\; v(x_t, t), \qquad x_0 \sim p_0(x).

Given an initial sample

x_0

from a source distribution

p_0

(typically a standard Gaussian), the ODE produces a unique trajectory

\{x_t\}_{t \in [0, 1]}

. The map

\phi_t(x_0) \;=\; x_t

is called the flow induced by

v

What the field does to a distribution

The flow does more than move individual particles; it transports the entire source density

p_0

forward in time. At each

t

, the pushforward

p_t \;=\; (\phi_t)_\sharp \, p_0

is a probability density on

\mathbb{R}^d

that describes where the swarm of particles is at time

t

. If the velocity field is chosen well, then at

t = 1

the pushforward matches the data distribution:

p_1 \;\approx\; p_{\text{data}}.

The pair

(v, p_t)

is linked by the continuity equation:

\frac{\partial p_t}{\partial t} + \nabla \cdot \bigl( p_t \, v \bigr) = 0.

This is the same equation that describes mass conservation in a fluid. Read geometrically: the local rate of change of density is determined by the divergence of the mass flux

p_t v

. Designing a velocity field is, in effect, designing a fluid flow that morphs noise into data.

Generation as trajectory integration

Once you have a learned velocity field

v_\theta

that approximates a true transport field, sampling from the model amounts to numerical ODE integration. The simplest scheme is Euler integration with step size

\Delta t

x_{t + \Delta t} \;=\; x_t + \Delta t \cdot v_\theta(x_t, t).

This is the same loop you ran on the storm: at each step, evaluate the field at the current position and take a small step in that direction. Starting from

x_0 \sim \mathcal{N}(0, I)

and stepping until

t = 1

, you obtain a sample

x_1

that is approximately distributed according to

p_{\text{data}}

. Higher-order solvers (Heun, RK4, adaptive Dormand-Prince) integrate the same field with fewer steps and lower truncation error. The choice of solver is decoupled from the choice of velocity field, which is a practical advantage of the flow-matching formulation.

Three views of the same object

A velocity field shows three complementary faces:

Vector field view. At each $(x, t)$ , draw the arrow $v(x, t)$ . Quiver plots are this view.
Streamline view. Fix $t$ and trace integral curves of $v(\cdot, t)$ .
Particle view. Release a cloud of particles at $t = 0$ and follow them as they advect under $v$ . Trajectory plots are this view.

The same machinery underlies fluid mechanics, dynamical systems, and the optical-flow problem in computer vision. Flow matching borrows the geometry and applies it to generative modeling: the model learns how probability mass flows through space and time.

What gets learned

The network is not asked to denoise or to predict a discrete sequence of tokens. It is asked to regress one scalar-valued function per output dimension:

v_\theta(x, t) \;\approx\; v^\star(x, t)

where

v^\star

is a target velocity field induced by a chosen probability path between

p_0

and

p_{\text{data}}

. Constructing that target, picking the path and writing down the corresponding velocity, is the heart of the flow-matching training objective and is covered in the Lipman et al. references below.

Pointers

Paper: Lipman et al. (2024). Flow Matching Guide and Code. Comprehensive, self-contained review covering mathematical foundations, design choices, and extensions, with a PyTorch reference implementation; Sections 2-3 cover velocity fields and the continuity equation in detail.
Foundational paper: Lipman et al. (2023). Flow Matching for Generative Modeling.
Code: facebookresearch/flow_matching, the companion library with examples for image and text generation.

PyTorch reference

PyTorch class	Description
`nn.Linear`	Applies an affine linear transformation to the incoming data: $y = xA^T + b$ .
`nn.SiLU`	Applies the Sigmoid Linear Unit (SiLU) function, element-wise.
`nn.Sequential`	A sequential container.

Edit this page on GitHub or file an issue.

Latent Transport Models

Mixture of Gaussians

Variational Autoencoders

Diffusion Models

Probability Transport Models

From a storm to generative AI

Definition

What the field does to a distribution

Generation as trajectory integration

Three views of the same object

What gets learned

Pointers

PyTorch reference

​From a storm to generative AI

​Definition

​What the field does to a distribution

​Generation as trajectory integration

​Three views of the same object

​What gets learned

​Pointers

​PyTorch reference

From a storm to generative AI

Definition

What the field does to a distribution

Generation as trajectory integration

Three views of the same object

What gets learned

Pointers

PyTorch reference