All you need to know about Gaussian distribution
DDPMs - Part 2
Show the code
import os
import scipy
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from scipy.stats import norm, multivariate_normal
import plotly
import plotly.graph_objects as go
from plotly.subplots import make_subplots
= 1234
seed
np.random.seed(seed)"seaborn")
plt.style.use(
= "../plots"
SAVE_PLOT_DIR if not os.path.exists(SAVE_PLOT_DIR):
os.mkdir(SAVE_PLOT_DIR)
Today we will talk about the Gaussian Distribution and its importance in understanding the Denoising Diffusion Models or DDPMs for short.
We will cover the maths behind these concepts and will code them along the way. IMO the maths looks much scarier when presented in equations, but it becomes simple once you look at the code implementation. So, don’t be scared by the greek symbols used in the equations. Try to undersatnd the code. It will help you understand the concepts better. Also, most of the visualizations presented in this notebook are interactive. Without further ado, let’s start!
1. Normal Distribution
Also known as Gaussian Distribution, the normal distribution is one of the most widely used continuous probability distributions. The notation, \(\mathcal{N}(\mu, \sigma^{2})\) , is commonly used to refer to a normal distribution. The parameter \(\mu\) represents the mean or expected value of the distribution, while the parameter \(\sigma\) is its standard deviation. The variance of the distribution is \(\sigma^{2}\)
For a real-valued random variable x
, the probability density function is defined as: \[
p(x; \mu, \sigma^{2}) = \frac{1}{\sigma \sqrt{2 \pi}} exp({\frac{-1}{2}(\frac {x - \mu}{\sigma})^{2}}) \tag {1}
\]
2. Univariate Normal Distribution
When we deal with only one random variable, e.g. x
in the above equation, we call the distribution a univariate normal distribution. Let’s code up the above equation and take an example of a univariate normal distribution.
Show the code
def get_univariate_normal(x, mu, sigma):
"""Generates a PDF for a Univariate Normal Distribution.
Args:
x: Vector of values sorted in ascending order
mu: Mean of the Gaussian
sigma: Standard Deviation of the Gaussian
Returns:
A PDF
"""
return 1 / (sigma * np.sqrt(2 * np.pi)) * (np.exp(-(x - mu)**2 / (2 * sigma**2)))
# Mean of the distribution
= 0.0
mu
# Standard deviation(SD for short)
= 1.0
sigma
# Generate a linspace for a random variable x
= 100
num_samples = np.linspace(-3*sigma + mu, 3*sigma + mu, num=num_samples)
x
# Plot the value against the PDF
= go.Figure(
fig =go.Scatter(
data=x,
x=get_univariate_normal(x, mu=mu, sigma=sigma),
y=dict(width=3,color="black"),
line="tonexty",
fill="skyblue",
fillcolor
)
)
=mu, y=-0.001, text="Mean",showarrow=True, arrowhead=2)
fig.add_annotation(x=mu, line_width=3, line_dash="dash", line_color="green")
fig.add_vline(x
fig.update_layout(
{"title": {
'text': f"Univariate Gaussian Distribution <br> μ: {mu}, σ\u00b2: {sigma**2}",
'y':0.95,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top',
'font': dict(size=14)
},"xaxis": {"title": "X"},
"yaxis": {"title": "Probability Density"},
"margin": dict(l=0, r=0, b=0, t=50)
}
)
"univariate_normal_example.html"))
fig.write_html(os.path.join(SAVE_PLOT_DIR, "univariate_normal_example.png"))
fig.write_image(os.path.join(SAVE_PLOT_DIR, fig.show()
Moving the mean value shifts the distribution from right to left while changing sigma affects the shape of the distribution. As the value of sigma increases, the curve gets more and more flat. Let’s see that in action.
Show the code
# Random variable x
= np.linspace(-10.0, 10.0, 100)
x
# Combination of mu and sigma. The first value
# of any tuple represents mu while the second value
# represents sigma here.
= [
mu_sigma_combos 0, 1), (-3, 1), (3, 1)],
[(0, 1), (0, 2), (0, 4)],
[(
]
# Line colors and widths to be used
# for different combinations
= ["red", "black", "blue"]
colors = [1, 2, 3]
widths = ["Varying μ", "Varying σ"]
subtitles
# Plot
= plt.subplots(1, 2, sharex=False, sharey=False, figsize=(12, 5), tight_layout=True)
_, ax for i, elem in enumerate(mu_sigma_combos):
= []
legend = set()
mus for j, comb in enumerate(elem):
= comb
mu, sigma
mus.add(mu)f"μ: {mu}, σ\u00b2: {sigma**2}")
legend.append(=widths[j], c=colors[j])
ax[i].plot(x, get_univariate_normal(x, mu, sigma), linewidth=True)
ax[i].tick_params(labelbottom
ax[i].set_title(subtitles[i])="upper right")
ax[i].legend(legend, loc
for mu in mus:
=mu, color="green", linestyle="--")
ax[i].axvline(x
"Univariate Normal Distribution")
plt.suptitle(
plt.savefig("univariate_with_dff_mu_sigma.png"),
os.path.join(SAVE_PLOT_DIR, ='tight'
bbox_inches
) plt.show()
3. Multivariate Normal Distribution
Multivariate normal distribution is the extension of univariate normal distribution to the case where we deal with a real-valued vector input instead of a single real-valued random variable. Like in the univariate case, the multivariate normal distribution has associated parameters \(\mu\) representing the mean vector and \(\Sigma\) representing the covariance matrix.
This is the probability density function for multivariate case: \[ p(x; \mu, \Sigma) = \frac{1}{(2\pi)^{d/2} \ \vert{\Sigma}\vert^{1/2}} \ exp \ (\frac{-1}{2}(x - \mu)^T \Sigma^{-1}(x - \mu)) \tag{2} \]
- Random Variable \(X = [x_{1}, \ x_{2}, \ x_{3}...]\) is a
D
-dimensional vector - Mean \(\mu = [\mu_{1}, \ \mu_{2}, \ \mu_{3}...]\) is a
D
-dimensional vector - Covariance Matrix \(\Sigma\) is a
D X D
dimensional matrix
If the vector X
is 2-dimensional, then that distribution is also known as Bivariate Normal Distribution.
Note: We need two dimensions to visualize a univariate Gaussian, and three dimensions to visualize a bivariate Gaussian. Hence for visualization purposes, we will show everything with a bivariate Gaussian, which then, can be extended to multivariate cases. Let’s visualize a bivariate Gaussian. We will use scipy.stats.multivariate_normal
to generate the probability density.
Show the code
def get_multivariate_normal(
mu,
cov,=True,
sample=None,
sample_size=None,
seed=False,
gen_pdf=None
pos
):"""Builds a multivariate Gaussian Distribution.
Given the mean vector and the covariance matrix,
this function builds a multivariate Gaussian
distribution. You can sample from this distribution,
and generate probability density for given positions.
Args:
mu: Mean vector representing the mean values for
the random variables
cov: Covariance Matrix
sample (bool): If sampling is required
sample_size: Only applicable if sampling is required.
Number of samples to extract from the distribution
seed: Random seed to be passed for distribution, and sampling
gen_pdf(bool): Whether to generate probability density
pos: Applicable only if density is generated. Values for which
density is generated
Returns:
1. A Multivariate distribution
2. Sampled data if `sample` is set to True else `None`
3. Probability Density if `gen_pdf` is set to True else `None`
"""
# 1. Multivariate distribution from given mu and cov
= multivariate_normal(mean=mu, cov=cov, seed=seed)
dist
# 2. If sampling is required
if sample:
= dist.rvs(size=sample_size, random_state=seed)
samples else:
= None
samples
# 3. If density is required
if gen_pdf:
if pos is None:
raise ValueError("`pos` is required for generating density!")
else:
= dist.pdf(pos)
pdf else:
= None
pdf
return dist, samples, pdf
# Mean of the random variables X1 and X2
= 0, 0
mu_x1, mu_x2
# Standard deviation for the random variables X1 and X2
= 1, 1
sigma_x1, sigma_x2
# Positions for which probability density is
# to be generated
= np.mgrid[
x1, x2 -3.0 * sigma_x1 + mu_x1):(3.0 * sigma_x1 + mu_x1): 0.1,
(-3.0 * sigma_x2 + mu_x2):(3.0 * sigma_x2 + mu_x2): 0.1
(
]= np.dstack((x1, x2))
pos
# Mean vector
= [mu_x1, mu_x2]
mu
# Covariance between the two random variables
= 0
cov_x1x2
# Covariance Matrix for our bivariate Gaussian distribution
= [[sigma_x1**2, cov_x1x2], [cov_x1x2, sigma_x2**2]]
cov
# Build distribution generate density
= get_multivariate_normal(
sample =mu,
mu=cov,
cov=False,
sample=seed,
seed=True,
gen_pdf=pos
pos
)
# Plot the bivariate normal density
= go.Figure(
fig
go.Surface(=sample[2],
z=x1,
x=x2,
y='Viridis',
colorscale=False
showscale
)
)
fig.update_layout(
{"title": dict(
="Bivariate Distribution",
text=0.95,
y=0.5,
x="center",
xanchor="top",
yanchor=dict(size=12)
font
), "scene":dict(
=dict(title='x1'),
xaxis=dict(title='x2'),
yaxis=dict(title='Probability density')
zaxis
),"xaxis": dict(title="Values"),
"yaxis": dict(title="Probability Density"),
}
)"bivariate_normal_example.html"))
fig.write_html(os.path.join(SAVE_PLOT_DIR, "bivariate_normal_example.png")) fig.write_image(os.path.join(SAVE_PLOT_DIR,
That’s an interactive plot. You can zoom in/out, and rotate the surface to see what the Gaussians look like. When you hover over the density, you will see black lines generating two univariate normals, one for x1
and another for x2
4. Covariance
In the last section, we introduced the term covariance matrix
. Before we move on to the next section, it is important to understand the concept of covariance. Let’s say X
and Y
are two random variables. Covariance of X
and Y
is defined as:
\[ Cov(X, Y) = E[(X - E(X))(Y - E(Y))] = E[XY] - E[X]E[Y] \tag{3} \]
i.e. the covariance is the expected value of the product of their deviations from their individual expected values.
So, what does covariance tell us? A lot…
- Covariance gives a sense of how much two random variables as well their scales are linearly related
- Covariance captures only linear dependence and gives no information about other kind of relationships
- If the sign of the covariance is positive, then both variables tend to take on relatively high values simultaneously. If the sign of the covariance is negative, then one variable tends to take on a relatively high value at the times that the other takes on a relatively low value and vice versa.
- If two variables
X
andY
are independent, then \(Cov(X, Y)= 0\) but the reverse isn’t true. Why? Because covariance doesn’t take into account non-linear relationships
5. Covariance Matrix
If X
is a N
-dimensional vector i.e. \(X = [x_{1}, \ x_{2}, \ x_{3}...x_{n}]\), the covariance matrix is a N
XN
matrix defined as:
\[ \begin{bmatrix} Cov(x_{1}x_{1}) & \cdots & Cov(x_{1}x_{n}) \\ \vdots & \ddots & \vdots \\ Cov(x_{n}x_{1}) & \cdots & Cov(x_{n}x_{n}) \end{bmatrix} \]
Each entry \((i, j)\) in this matrix defines the covariance of two random variables of the vector. Also:
\[ Cov(x_{i}x_{j}) = Var(x_{i}) \ \ \ \{ i=j\} \]
Let’s take an example to showcase zero covariance, positive covariance, and negative covariance between two random variables x1
and x2
Show the code
# Mean of the random variables X1 and X2
= 0, 0
mu_x1, mu_x2
# Standard deviation of the random variables X1 and X2
= 1, 1
sigma_x1, sigma_x2
# Number of samples to extract
= 2000
sample_size
# Positions for which probability density is
# to be generated
= np.mgrid[
x1, x2 -3.0 * sigma_x1 + mu_x1):(3.0 * sigma_x1 + mu_x1): 0.1,
(-3.0 * sigma_x2 + mu_x2):(3.0 * sigma_x2 + mu_x2): 0.1
(
]= np.dstack((x1, x2))
pos
# Mean vector
= [mu_x1, mu_x2]
mu
# Case 1: Zero Covariance
= 0
cov_x1x2 = [[sigma_x1**2, cov_x1x2], [cov_x1x2, sigma_x2**2]]
zero_cov # Build distribution, sample and generate density
= get_multivariate_normal(
zero_cov_res =mu,
mu=zero_cov,
cov=True,
sample=sample_size,
sample_size=seed,
seed=True,
gen_pdf=pos
pos
)
# Case 2: Positive Covarinace
= 0.9
cov_x1x2 = [[sigma_x1**2, cov_x1x2], [cov_x1x2, sigma_x2**2]]
pos_cov # Build distribution, sample and generate density
= get_multivariate_normal(
pos_cov_res =mu,
mu=pos_cov,
cov=True,
sample=sample_size,
sample_size=seed,
seed=True,
gen_pdf=pos
pos
)
# Case 3: Negative Covarinace
= -0.9
cov_x1x2 = [[sigma_x1**2, cov_x1x2], [cov_x1x2, sigma_x2**2]]
neg_cov # Build distribution, sample and generate density
= get_multivariate_normal(
neg_cov_res =mu,
mu=neg_cov,
cov=True,
sample=sample_size,
sample_size=seed,
seed=True,
gen_pdf=pos
pos
)
# Plot the covariances
= plt.subplots(1, 3, figsize=(10, 4), sharex=True, sharey=True)
_, ax = [neg_cov_res[1], zero_cov_res[1], pos_cov_res[1]]
samples = ["Negative Covariance", "Zero Covariance", "Positive Covariance"]
titles
for i in range(3):
0], samples[i][:, 1], c="green")
ax[i].scatter(samples[i][:, "X1")
ax[i].set_xlabel("X2")
ax[i].set_ylabel(
ax[i].set_title(titles[i])=True)
ax[i].tick_params(labelleft=mu[0], color="blue", linestyle="--")
ax[i].axvline(x=mu[1], color="red", linestyle="--")
ax[i].axhline(y
plt.tight_layout()"covariance_pair_plot.png"))
plt.savefig(os.path.join(SAVE_PLOT_DIR, plt.show()
The blue
and red
lines in the above plot represent the mean values of x1
and x2
respectively. Let’s visualize how the probability density surface changes with the covariance
Show the code
= go.Figure(
fig
go.Surface(=neg_cov_res[2],
z=x1,
x=x2,
y='Hot',
colorscale=False
showscale
)
)
fig.update_layout(
{"title": dict(
=f"Bivariate Distribution<br>cov_x1x2: {neg_cov[0][1]:.2f}",
text=0.95,
y=0.5,
x="center",
xanchor="top",
yanchor=dict(size=12)
font
), "scene":dict(
=dict(title='X1'),
xaxis=dict(title='X2'),
yaxis=dict(title='Probability density')
zaxis
),"xaxis": dict(title="Values"),
"yaxis": dict(title="Probability Density"),
}
)
"bivariate_negative_covariance_density.html"))
fig.write_html(os.path.join(SAVE_PLOT_DIR, "bivariate_negative_covariance_density.png"))
fig.write_image(os.path.join(SAVE_PLOT_DIR, fig.show()
Show the code
= go.Figure(
fig
go.Surface(=zero_cov_res[2],
z=x1,
x=x2,
y='Viridis',
colorscale=False
showscale
)
)
fig.update_layout(
{"title": dict(
=f"Bivariate Distribution<br>cov_x1x2: {zero_cov[0][1]:.2f}",
text=0.95,
y=0.5,
x="center",
xanchor="top",
yanchor=dict(size=12)
font
), "scene":dict(
=dict(title='X1'),
xaxis=dict(title='X2'),
yaxis=dict(title='Probability density')
zaxis
),"xaxis": dict(title="Values"),
"yaxis": dict(title="Probability Density"),
}
)"bivariate_zero_covariance_density.html"))
fig.write_html(os.path.join(SAVE_PLOT_DIR, "bivariate_zero_covariance_density.png"))
fig.write_image(os.path.join(SAVE_PLOT_DIR, fig.show()
Show the code
= go.Figure(
fig
go.Surface(=pos_cov_res[2],
z=x1,
x=x2,
y=False
showscale
)
)
fig.update_layout(
{"title": dict(
=f"Bivariate Distribution<br>cov_x1x2: {pos_cov[0][1]:.2f}",
text=0.95,
y=0.5,
x="center",
xanchor="top",
yanchor=dict(size=12)
font
), "scene":dict(
=dict(title='X1'),
xaxis=dict(title='X2'),
yaxis=dict(title='Probability density')
zaxis
),"xaxis": dict(title="Values"),
"yaxis": dict(title="Probability Density"),
}
)"bivariate_positive_covariance_density.html"))
fig.write_html(os.path.join(SAVE_PLOT_DIR, "bivariate_positive_covariance_density.png"))
fig.write_image(os.path.join(SAVE_PLOT_DIR, fig.show()
A few things to note in the above surface plots:
- The plot with zero covariance is circular in every direction.
- The plots with negative and positive covariances are more flattened on the 45-degree line, and are somewhat in a perpendicular direction to that line visually.
6. Isotropic Gaussian
An isotropic Gaussian is one where the covariance matrix \(\Sigma\) can be represented in this form: \[ \Sigma = \sigma^2 I \tag{4} \]
- \(I\) is the Identity matrix
- \(\sigma^2\) is the scalar variance
Q: Why do we want to represent the covarince matrix in this form?
A: As the dimensions of a multivariate Gaussian grows, the mean \(\mu\) follows a linear growth where the covarince matrix \(\Sigma\) follows quadratic growth in terms of number of parameters. This quadratic growth isn’t computation friendly. A diagonal covariance matrix makes things much easier
A few things to note about isotropic Gaussian:
Eq. (4) represents a diagonal matrix multiplied by a scalar variance. This means that the variance along each dimension is equal. Hence an isotropic multivariate Gaussian is circular or spherical.
We discussed that
Cov(x1, x2)=0
doesn’t meanx1
andx2
are independent but if the distribution is multivariate normal andCov(x1, x2)=0
, it implies thatx1
andx2
are independent.If the multivariate distribution is isotropic, that means: i. Covariance Matrix is diagonal ii. The distribution can be represented as a product of univariate Gaussians i.e. P(X) = P(x1)P(x2)..
Let’s take an example of an isotropic normal
Show the code
# Mean of the random variables X1 and X2
= 0, 0
mu_x1, mu_x2
# Standard deviation of the random variables X1 and X2
# Remember the std. is going to be the same
# along all the dimensions.
= sigma_x2 = 2
sigma_x1
# Number of samples to extract
= 5000
sample_size
# Positions for which probability density is
# to be generated
= np.mgrid[
x1, x2 -3.0 * sigma_x1 + mu_x1):(3.0 * sigma_x1 + mu_x1): 0.1,
(-3.0 * sigma_x2 + mu_x2):(3.0 * sigma_x2 + mu_x2): 0.1
(
]= np.dstack((x1, x2))
pos
# Mean vector
= [mu_x1, mu_x2]
mu
# Because the covariance matrix of an multivaraite isotropic
# gaussian is a diagonal matrix, hence the covariance for
# the dimension will be zero.
= 0
cov_x1x2 = [[sigma_x1**2, cov_x1x2], [cov_x1x2, sigma_x2**2]]
cov
= get_multivariate_normal(
isotropic_gaussian =mu,
mu=cov,
cov=True,
sample=sample_size,
sample_size=seed,
seed=True,
gen_pdf=pos
pos
)
= make_subplots(
fig =1, cols=2,
rows=False,
shared_yaxes=False,
shared_xaxes=[[{'type': 'scatter'}, {'type': 'surface'}]],
specs=(
subplot_titles"Covariance x1_x2 = 0.0",
f"mu_x1: {mu_x1} sigma_x1: {sigma_x1**2} <br>mu_x2: {mu_x2} sigma_x2: {sigma_x2**2}"
)
)
fig.add_trace(
go.Scatter(=isotropic_gaussian[1][:, 0],
x=isotropic_gaussian[1][:, 1],
y='markers',
mode=dict(size=5, color="green"),
marker
),=1, col=1
row
)
fig.add_trace(
go.Surface(=isotropic_gaussian[2],
z=x1,
x=x2,
y='RdBu',
colorscale=False
showscale
),=1, col=2
row
)
fig.update_layout(
{"scene":dict(
=dict(title='X1'),
xaxis=dict(title='X2'),
yaxis=dict(title='Probability density')
zaxis
),"xaxis": {"title": "X1"},
"yaxis": {"title": "X2"},
"title": {"text": "Isotropic Gaussian", "x":0.5, "font":dict(size=20)}
}
)
"isotropic_gaussian.html"))
fig.write_html(os.path.join(SAVE_PLOT_DIR, "isotropic_gaussian.png")) fig.write_image(os.path.join(SAVE_PLOT_DIR,
7. Conditional Distribution
Let’s say we have a multivariate distribution over x
where \[
x = \begin{bmatrix}x_{1}\\x_{2}\end{bmatrix} \\
\mu = \begin{bmatrix}\mu_{1}\\\mu_{2}\end{bmatrix} \\
\Sigma = \begin{bmatrix}\Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22}\end{bmatrix}
\]
Then the distribution of x1
conditional on x2=a
is multivariate normal:
\[ p(x1 | x2 = a) \sim N(\bar\mu, \bar\Sigma) \ \ \ \text {where:} \]
\[ \bar\mu = \mu_{1} + \ \Sigma_{12}\Sigma_{22}^{-1}(a - \mu_{2}) \tag{5} \] \[ \bar\Sigma = \Sigma_{11} - \ \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21} \tag{6} \]
But why are we discussing the conditional distribution? Do you need to remember these equations?
Well, first and foremost you don’t have to mug up any of these equations. You can find them easily on Wikipedia. But there is a reason why we are discussing the conditional distributions here. Let’s say you have a process that takes the form of a Markov Chain. For example, the forward process in Diffusion Models is one example of such a sequence. Let’s write down the equation for the same. \[ q(x_{1:T}\vert x_{0}) := \prod_{t=1}^{T}q(x_{t}\vert x_{t-1}) :=\prod_{t=1}^{T}\mathcal{N}(x_{t};\sqrt{1-\beta_{t}} x_{t-1},\ \beta_{t}\bf I) \tag{7} \]
This equation tells us a few things:
- The forward process is a Markov Chain where the sample at the present step depends only on the sample at the previous timestep.
- The covariance matrix is diagonal
- At each timestep in this sequence, we gradually add Gaussian noise. But this isn’t very clear from the equation where is this addition taking place, right?
- The term \(\beta\) represents the variance at a particular timestep
t
such that \(0 < \beta_{1} < \beta_{2} < \beta_{3} < ... < \beta_{T} < 1\)
To understand the last point, let’s make some assumptions to simplify the eq (7).
1. We will ignore the variance schedule \(\beta\) for now. We can rewrite the above equation as: \[q(x_{1:T}\vert x_{0}) := \prod_{t=1}^{T}q(x_{t}\vert x_{t-1}) :=\prod_{t=1}^{T}\mathcal{N}(x_{t}; x_{t-1},\ \bf I) \tag{8}\] 2. We will consider x
as a univariate normal. This assumption is made only to give readers a better understanding of the conditional Gaussian case. The same concept extends to multivariate normal.
With the above assumptions in mind, combined with the Law of total probability
, we can write the equation as:
\[p(x_{t}) = \int p(x_{t} \vert \ x_{t-1})p(x_{t-1})dx_{t-1} \tag {9}\]
Using (8), we can rewrite this as: \[p(x_{t}) = \int \mathcal{N}(x_{t}; \ x_{t-1}, 1) p(x_{t-1})dx_{t-1} \tag {10}\]
Because we are dealing with univariate normal, the identity matrix is nothing but a scalar value of 1. Moving forward, we can shift the terms in the above equation like this: \[p(x_{t}) = \int \mathcal{N}(x_{t} - x_{t-1}; \ 0, 1) p(x_{t-1})dx_{t-1} \tag {11}\]
Notice two things that we got by this shift: 1. The mean and the variance values of the distribution on the right-hand side are 0
and 1
respectively. 2. If you look closely at the term on the RHS in the above equation, you will notice that it is the definition of convolution.
Combining the above two facts, we can rewrite (11) as: \[p(x_{t}) = \mathcal{N}(0, 1) \ * \ p(x_{t-1}) \tag {12}\]
Property: The convolution of individual distributions of two or more random variables equals the sum of the random variables.
\[ \Longrightarrow x_{t} = \mathcal{N}(0, 1) \ + \ x_{t-1} \tag {13}\]
We will prove the above property in the next section, but this is one of the things that you should remember. Also, we hope that it is clear now why conditioned distributions in the forward process of Diffusion models are equivalent to “adding Gaussian noise” to previous timesteps. Don’t worry about the diffusion models related equations and terms like variance schedule
. We will talk about them in detail in the upcoming notebooks.
8. Convolution of probability distributions
In the last section we stated that the convolution of individual distributions of two or more random variables equals the sum of the random variables. To prove this, we will take an example of two normally distributed random variables X
and Y
. If X
and Y
are two normally distributed independent random variables such that
\[ X \sim \mathcal{N}(\mu_{1}, \sigma_{1}^2) \\ Y \sim \mathcal{N}(\mu_{2}, \sigma_{2}^2) \\ \]
\(\text{if} \ Z = X + Y \Longrightarrow Z \sim \mathcal{N}(\mu_{1} + \mu_{2}, \sigma_{1}^2 +\sigma_{2}^2) \tag{14}\)
Our goal is to prove that if we convolve the PDFs of X
and Y
, then the resulting distribution would be identical to the distribution of Z
. Let’s write down the code for it.
Show the code
# Mean and Standard deviation of X
= 0.0
mu_x = 2.0
sigma_x
# Mean and Standard deviation of Y
= 2.0
mu_y = 4.0
sigma_y
# Mean and Standard deviation of Z
= mu_x + mu_y
mu_z = np.sqrt(sigma_x**2 + sigma_y**2)
sigma_z
# Get the distributions
= norm(loc=mu_x, scale=sigma_x)
dist_x = norm(loc=mu_y, scale=sigma_y)
dist_y = norm(loc=mu_z, scale=sigma_z)
dist_z
# Generate the PDFs
= 1e-4
step_size = np.arange(-30, 30, step_size)
points
= dist_x.pdf(points)
pdf_x = dist_y.pdf(points)
pdf_y = dist_z.pdf(points)
pdf_z
# NOTE: We cannot convolve over continous functions using `numpy.convolve(...)`
# Hence we will discretize our PDFs into PMFs using the step size we defined above
= pdf_x * step_size
pmf_x = pdf_y * step_size
pmf_y
# Convolve the two PMFs
= np.convolve(pmf_x, pmf_y, mode="same")
conv_pmf = conv_pmf / step_size
conv_pdf
# Let's plot the distributions now and check if we have gotten
# the same distribution as Z.
# NOTE: As we have approximated PMF from PDF, there would be
# erros in the approximation. So, the final result may not
# look 100% identical.
= plt.subplots(1, 2, sharex=False, sharey=False, figsize=(12, 5))
_, ax
0].plot(points, pdf_x)
ax[0].plot(points, pdf_y)
ax[0].plot(points, pdf_z)
ax[0].set_title("Z = X + Y")
ax[0].legend(["PDF-X", "PDF-Y", "PDF-Z"])
ax[
1].plot(points, pdf_x)
ax[1].plot(points, pdf_y)
ax[1].plot(points, conv_pdf)
ax[1].set_title("Convolution of PDFs of X and Y")
ax[1].legend(["PDF-X", "PDF-Y", "Convolved"])
ax[
"convolution_of_pdfs.png"))
plt.savefig(os.path.join(SAVE_PLOT_DIR, plt.show()
9. The Forward Process
Though we will talk about Diffusion Models
in detail in the future posts as well, I will implement the forward process here to give you an idea that things that might look complicated in symbols aren’t that complicated in terms of code. Let’s rewrite the forward process equation again:
\[ q(x_{1:T}\vert x_{0}) := \prod_{t=1}^{T}q(x_{t}\vert x_{t-1}) :=\prod_{t=1}^{T}\mathcal{N}(x_{t};\sqrt{1-\beta_{t}} x_{t-1},\ \beta_{t}\bf I) \]
To implement the above equation:
- We have to define the total number of timesteps
T
- We have to generate \(\beta_{t}\) using a schedule. We can use any schedule including but not limited to linear, quadratic, etc. The only thing that we need to ensure is that \(\beta_{1} < \beta_{2}...\)
- Sample a new image at timestep
t
from a conditional Gaussian for which the paramters are \(\mu_{t} = \sqrt{1-\beta_{t}} x_{t-1}\) and \(\sigma_{t}^2 = \beta_{t}\) - For the last point, we can use the property(eq. (13)) we studied in the conditional distribution section. Hence we can write this as:
\[x_{t} \sim (\mathcal{N}(\sqrt{1-\beta_{t}} x_{t-1},\ \beta_{t}) + \mathcal{N}(0, 1))\]
\[ \Rightarrow x_{t} = \sqrt{1-\beta_{t}} x_{t-1} + \sqrt{\beta_{t}}\epsilon \ \ ; \ \text{where} \ \epsilon \sim \mathcal{N}(0, 1) \tag{15} \]
Now that we have broken down the equation in much simpler parts, let’s code it!
Show the code
def forward_process_ddpms(img_t_minus_1, beta, t):
"""Implements the forward process of a DDPM model.
Args:
img_t_minus_1: Image at the previous timestep (t - 1)
beta: Scheduled Variance
t: Current timestep
Returns:
Image obtained at current timestep
"""
# 1. Obtain beta_t. Reshape it to have the same number of
# dimensions as our image array
= beta[t].reshape(-1, 1, 1)
beta_t
# 2. Calculate mean and variance
= np.sqrt((1.0 - beta_t)) * img_t_minus_1
mu = np.sqrt(beta_t)
sigma
# 3. Obtain image at timestep t using equation (15)
= mu + sigma * np.random.randn(*img_t_minus_1.shape)
img_t return img_t
# Let's check if our forward process function is
# doing what it is supposed to do on a sample image
# 1. Load image using PIL (or any other library that you prefer)
= Image.open("../images/cat.jpg")
img
# 2. Resize the image to desired dimensions
= (128, 128)
IMG_SIZE = img.resize(size=IMG_SIZE)
img
# 3. Define number of timesteps
= 100
timesteps
# 4. Generate beta (variance schedule)
= 0.0001
beta_start = 0.05
beta_end = np.linspace(beta_start, beta_end, num=timesteps, dtype=np.float32)
beta
= []
processed_images = np.asarray(img.copy(), dtype=np.float32) / 255.
img_t
# 5. Run the forward process to obtain img after t timesteps
for t in range(timesteps):
= forward_process_ddpms(img_t_minus_1=img_t, beta=beta, t=t)
img_t if t%20==0 or t==timesteps - 1:
= (img_t.clip(0, 1) * 255.0).astype(np.uint8)
sample
processed_images.append(sample)
# 6. Plot and see samples at different timesteps
= plt.subplots(1, len(processed_images), figsize=(15, 6))
_, ax
for i, sample in enumerate(processed_images):
ax[i].imshow(sample)f"Timestep: {i*20}")
ax[i].set_title("off")
ax[i].axis(False)
ax[i].grid(
"Forward process in DDPMs", y=0.75)
plt.suptitle("forward_process.png"))
plt.savefig(os.path.join(SAVE_PLOT_DIR,
plt.show() plt.close()
One thing to note here is that as you increase the number of timesteps T
, \(\beta_{t} \to 1\). At that point:
\[q(x_{T}) \approx \mathcal{N}(x_{T};\ 0, I)\]
That’s it for now. We will talk about reverse process and other things related to DDPMs in the future notebooks. I hope this notebook was enough to give you a solid understanding of Gaussian distribution and their usage in context of DDPMs.
References
- Wiki: Normal Distribution
- Wiki: Multivariate Normal
- Applied Multivariate Statistical Analysis
- The Gaussian Distribution
- Numpy docs on multivariate normal
- Scipy statistics tutorial
- Probability and Information Theory
- Gaussian Distribution and their properties
- The Anisotropic Multivariate Normal Distribution
- Primer on Gaussian
- Gaussians
- Isotropic Gaussian
- Isotropic Gaussian
- Wiki: Convolution of PDFs
- Convolvinf pdfs in Python
- Assembly AI: Intro to Diffusion Models
- Ari Seff’s tutorial
- Denoising Diffusion Probabilistic Models
Citation
@misc{nain2022,
author = {Nain, Aakash},
title = {All You Need to Know about Gaussian Distribution},
date = {2022-08-03},
url = {https://magic-with-latents.github.io/latent/posts/ddpms/part2/},
langid = {en}
}