Flow Matching Theorem¶
[toc]
Note
In this article, we establish the theroy foundation of the flow matching method
Please read the section introduction for common knowledges of the flow matching and standard notations
1. Formula of flow matching¶
Suppose \(v(x,t)\) is the velocity field from push the probability \(p\) to \(q\). Our aim is to estimate the delocity field
Flow Matching Loss
The flow matching loss is defined as
$$ \begin{equation} L_{F M}=E_{t,X_t\sim p_t} ||v(x,t) - v_\theta(x,t)||^2 \end{equation} $$
Here \(||\) could be any distance metric besides the normal mean square error.
For simplicity, we denote the \(\pi_{0,1}(X_0,X_1)\) be the joint distribution of the data coupling
Although, the common used distirbution of \(X_0\) is the Gaussian noise, we just consider the general distribution in this section.
1.1 Conditional probability path¶
Let \(p_{t|1}(x|x_1)\) be the consitional probability path.
Then the marginal probability path (responde to the joint distribution \(\pi_{0,1}\)) \(p_t\)
and \(p_t\) satisfied the boundary condition
which required the conditional probability path satisfy
where \(\delta_{x_1}\) is the delta measure centered at \(x_1\).
If we consider the independent data coupling, then
The constrains becomes \(p_{0|1}(x|x_1) = p(x)\).
The second condition could also be written as
as \(t\rightarrow 1\) for any continuous function \(f\) since \(\delta\) has no density function.
1.2 Conditional Velocity Field¶
Let \(v_t(\cdot|x_1)\) generates \(p_{t|1}(\cdot | x_1)\)
Thus we have
This can be viewed as a weighted average of the conditional velocities or it can be regared as the conditional expectation
Expectation of conditional velocity
Given any Random Variable \(Z\), \(p_Z\) has bounded support,
\(\(p_{t|Z}(x|z)\in C^1([0,1]\times R^d)\)\)
\(\(v_t(x|z) \in C^1([0,1]\times R^d,R^d), p_t(x)>0 \; \forall x \in R^d, t\in[0,1)\)\)
If \(v_t(x|z)\) is conditional integrable and generates the conditional probability path \(p_t(\cdot|z)\), then the marginal velocity field \(v_t\) generates the marginal probability path \(p_t\) for all \(t\in [0,1)\)
Expectation of conditional velocity
Step 1: Differentiating $ p_t(x) $ By the law of total probability:
$$ p_t(x) = \int p_{t|Z}(x|z) p_Z(z) dz. $$
Differentiating both sides:
$$ \frac{d}{dt} p_t(x) = \int \frac{d}{dt} p_{t|Z}(x|z) p_Z(z) dz. $$
Since $ v_t(x|z) $ generates $ p_{t|Z}(x|z) $, we have:
$$ \frac{d}{dt} p_{t|Z}(x|z) = -\nabla_x \cdot [ v_t(x|z) p_{t|Z}(x|z) ]. $$
Thus,
$$ \frac{d}{dt} p_t(x) = -\int \nabla_x \cdot\left[ v_t(x|z) p_{t|Z}(x|z) \right] p_Z(z) dz. $$
Using the Leibniz rule to move differentiation outside the integral:
$$ \frac{d}{dt} p_t(x) = -\nabla_x \cdot \int v_t(x|z) p_{t|Z}(x|z) p_Z(z) dz. $$
Step 2: Expressing $ v_t(x) $ Using Bayes' Formula From Bayes' rule:
$$ p_{t|Z}(x|z) p_Z(z) = p_{Z|t}(z|x) p_t(x), $$
we substitute:
$$ \frac{d}{dt} p_t(x) = -\nabla_x \cdot \int v_t(x|z) p_{Z|t}(z|x) p_t(x) dz. $$
Since the definition of $ v_t(x) $ is:
$$ v_t(x) = \int v_t(x|z) p_{Z|t}(z|x) dz, $$
Contitional Flow Matching
Define the conditional flow matching loss as $$ L_{CF M}(\theta) = E_{t,z,x_t\sim p_{t|Z}(\cdot | Z)} || v_t(x_t|Z)-v_\theta(x_t,t)||^2 $$
Equivalent of gradient of FM and CFM loss
The gradients of the Flow Matching loss and the Conditional Flow Matching loss coincide. In particular, the minimizer of the Conditional Flow Matching loss is the marginal velocity.
$$ \begin{equation} \nabla_\theta L_{F M}(\theta) = \nabla_\theta L_{CF M}(\theta) \end{equation} $$
Proof
To prove that the gradients of the Flow Matching (FM) loss and the Conditional Flow Matching (CFM) loss coincide, i.e.,
Take the gradient
$$ \begin{equation} \nabla L_{F M} = \nabla E_{t, X_t}[||v(x,t) - v_\theta(x,t)||^2] = E_{t, X_t} (v_\theta - v)\nabla v_\theta \end{equation} $$
Use the expectation formula of conditional velocity field
$$ \begin{equation} \nabla L_{F M} =E_{t,X_t} (v_\theta - E_{Z\sim p_{Z|t}(\cdot|X_t)}[v_t(X_t|Z)])\nabla v_\theta = E_{t,X_t,Z\sim p_{Z|t}(\cdot|X_t) } (v_\theta - v_t(x_t|Z))\nabla v_\theta \end{equation} $$
By chain rule,
$$ \nabla L_{F M} = = E_{t,X_t,Z\sim p_{Z|t}(\cdot|X_t) }[ \nabla (||v_t(x_t|Z) - v_\theta||^2)] $$
Change the order of expectation and gradient we hvae
$$ \nabla L_{F M} = \nabla E_{t,X_t,Z\sim p_{Z|t}(\cdot|X_t) }[||v_t(x_t|Z) - v_\theta||^2] $$
By the Bayes's rule,
$$p_{Z|t}(Z|X_t) p(X_t) = p_{t|Z}(X_t|z) $$
we have
$$ E_{X_t,Z\sim p_{Z|t}(\cdot|X_t) } \cdots = \int p(X_t)p_{Z|t}(\cdot|X_t) \cdots dx_t dz $$
which equals
$$ E_{X,x_t\sim p_{t|Z}(X_t|z) } \cdots = \int p(z)p_{t|Z}(x_t|z) \cdots dx_t dz $$
Hence
$$ \begin{equation} \nabla L_{F M} = \nabla E_{t,z,x_t\sim p_{t|Z}(x_t|z)} [ ||v_t(x_t|Z) - v_\theta||^2] \end{equation} $$
Different conditioning choices Z exist but are essentially all equivalent.
Main options include fixing target samples \(Z = X_1\), source samples \(Z = X_0\), or two-sided \(Z = (X_0, X_1)\).
2. Summary¶
Given the source distributin \(p\) and target distribution \(q\), we can build a probability path from \(p\) to \(q\) with the probability path and velocity field path
probability path
$$ p_t(x) = \begin{cases} p(x), & when\; t=0\ q(x), & when\; t=1 \end{cases} $$
\(\(X_t\sim p_t\)\)
\(\(X_t = \psi_t(x_0), x_0\sim X_0\)\)
\(\(\frac{\mathrm{d} \psi_t(x_0)}{\mathrm{d} t} = \dot\psi_t(x_0) = v(\psi(x_0),t),\quad x_0\sim p, \quad \psi_0(x_0) = x_0\)\)
$$v(x,t) = \dot\psi_t(x_0,t) = \dot\psi_t(\psi^{-1}_t(x)) $$
\(\(p_t(x) = \bigl[{\psi_t}_{\#}p \bigr](x)\)\)
\(v(x,t)\) is the velocity field generates the probablity path. Note that this velocity field is not unique by the solution of the continuity equation. Under certain condition, the velocity field could be unique like the probability flow ODE or the optimal transit.
But in practice, we dont know the exact formula of the probability path and the velocity field. So need to find method to calculate.
We can can consider the condition flow. Take the notation by under the condition \(X_1\) = \(x_1\).
Conditional Probability Path
$$\psi_t(x|x_1) = \begin{cases} x, & t = 0,\ x_1, & t=1. \end{cases} $$
\(\(v(x_t|x_1,t) = \dot\psi_t (x_0|x_1) = \dot\psi_t ( \psi_t^{-1} (x|x_1)|x_1)\)\)
\(\(\mathcal{L} _{C F M} = E_{t, x_0,x_1}||\dot\psi_t(x_0|x_1) - v_\theta(x,t)||^2)\)\)
💬 Comments Share your thoughts!