Fresher on Random Variables

DDPMs - Part 1 (Optional)

linear algebra

diffusion models

Author

Published

August 1, 2022

This is the first (optional) chapter in a 3-part series covering the basics of Random Variables. We do not cover a lot of material here as we believe there’s enough online. Random Variables are a core component of statistics and probability. This reading places a heavier emphasis on understanding the theory rather than implementation specifics.

1. Random Variables

The study of probability (and most of statistics) revolves around “containers of information” that can hold different contents based on different associated events. These containers are called Random Variables (RV) and they can take up different values (any one at once, not combinations). Each event \(E_i\) (and hence, value) also has an associated probability \(p_i\) of happening.

For instance, suppose we roll a fair die once. Let’s call the result of the roll, \(X\). Here, \(X\) is a RV and it has 6 associated events for each face it can land on, and as a result, 6 possible values, depending on which event takes place. Upon rolling, we have ourselves left with an upward-oriented face with a number on it. \(X\) is now assigned this die value. The probably of each event is ⅙ (since we’re using a fair die) – there’s a 1 in 6 chance that any number will face upwards after the die has landed.

1.1 Continuous Random Variables

Continuous RVs can take in any float/real number (\(\mathbb{R}\)) as a value based on the context. When filling up a bottle, your bottle can have 30.6ml or 56.89ml or 350.47ml of water in it, or the daily temperature can be 35ºC or 29.8ºC or 37.2ºC. It fluctuates and can take intermediate values. For some continuous RVs, there’s no range while others can be bounded.

1.2 Discrete Random Variables

These usually fall into the category of integers (\(\mathbb{Z}\)) that may or may not fall in a specific range in a specific context. Examples include the face value of a die when rolled, a card pulled from a deck, or letter grades on an exam like A, B, C, D, F. The similarity is that the collection of values from which they are sampled or pulled from are limited and finite. There is no “in-between” value (eg: you cannot get 5.5 on a die).

2. Independence

Suppose there are two events. If one does not affect the other, the two events are independent of each other. Though, what does “affect” mean in this context? It means the occurrence of one event does not impact the occurrence of the other. For instance, if I flip a fair coin 3 times, the second flip does not depend on the first flip, neither does the third flip on the second; in essence, the past outcome does not decide the future outcome. While there might be some causal link in the real world between events, in theory, we see the two events in isolation with each other, with no other factors involved.

On the other hand, dependent events impact the occurrence of one another. For example, if you misuse your vehicle on the road, there’s a higher probability of getting caught by the authorities as compared to using your vehicle appropriately. The occurrence of one event changes the probability of the other event occurring.

3. Expectation

When we flip a die several times (say, a large number like 1000), we’d like to know what is the average score we get at the end. The Expectation of a RV is a weighted sum of all possible values, weighted by the respective probabilities of occurrence. Ideally, for a die, each face value, 1 to 6, has probability \(p_i = \frac{1}{6}\) of occuring. As such, the expectation or expected value of a RV is,

\[ \begin{align} \mathbb{E}(X) &= \sum_{i} p_i \cdot V_i \end{align} \]

where \(p_i\) is the probability of event \(E_i\) taking place, and \(V_i\) is the value of said event. You may be wondering why this expectation is a float number that’s not a possible value. Since we’re looking at a simple average value and not the most occurring value, it makes sense to understand it as such. The expectation tells us what is the average face we get after flipping the die numerous times.

4. Variance

As you flip the fair die, you probably won’t get the same value again and again – you’ll notice some deviation. The expected value of a fair die is 3.5. The maximum deviation of the die value in the long run is given the Variance. It’s always a positive number which represents how much the value of a RV deviates on either side of the expectation.

In fact, we’ve been using the term “deviation” a lot here. Surprisingly enough, variation is the square of the standard deviation \(\sigma\) of a RV, i.e., \(\text{Var}(X) = \sigma^2\). The variance of a random variable \(X\) can be computed as follows:

\[ \begin{align} \text{Var}(X) &= \mathbb{E}(X^2) + (\mathbb{E}(X))^2 \\ &= \mathbb{E}(X - \mathbb{E}(X))^2 \end{align} \]

5. Distributions

Random Variables have associated probabilities that dictate the chances of an event occurring. For a fair die, the probability is it’s \(\frac{1}{6}\) and for a fair coin, it’s \(\frac{1}{2}\). However, what is the characteristic of the random variable that shapes these probabilities?

This is given by the Distribution, a function that gives the probability of an event taking place. For some distributions, all events may have the same probability, while other distributions weight certains events more than others (causing some events to be more likely than others). There are a whole bunch of distributions that describe both synthetic and real world systems. Here are examples of some distributions:

5.1 Uniform Distribution

For starters, let’s look at the Uniform Distribution. To represent a RV from a Uniform Distribution, we denote it as \(X \sim U(a, b)\). This distribution has two parameters we need to supply – \(a\) and \(b\). They denote the bounds of the possible values the RV can assume. Here, the probability of any value occurring within this bound \([a, b]\) is equal. For example, if we have \(X \sim U(0, 1)\), all the values inside that range (like 0.1, 0.2, 0.03, 0.023452) have an equal probability of occurring.

5.2 Binomial Distribution

In many real-life applications, there’s this notion of failure or success associated with events. A random variable can hold the value of success or failure with a certain probability p of occurrence. The Binomial Distribution allows us to scale this single FAIL/PASS trial to many objects at once with replacement. To represent a RV from a Binomial Distribution, we denote it as \(X \sim \text{Bin}(p, n)\). We supply two parameters again: \(p\) is the probability of success and \(n\) is number of samples we are considering, each which can either be a success or failure.

For instance, at a factory, a certain machine part is manufactured without defects with probability 0.75. If the factory wants to test a bunch of samples for quality assurance, they can collect a sample of 100. Using this, we can answer questions like “What is the probability of 90 objects passing the defect test?” or “what is the probability of more than 10 objects failing the defect test?” and make changes to the process accordingly. Here, we’d say \(X \sim \text{Bin}(0.75, 100)\).

5.3 Normal/Gaussian Distribution

This is important for the understanding of Diffusion Models. In the real world, everything isn’t as clearcut as FAIL/PASS. Neither do events all have the same probability of occurrence. There are some events that occur more often than others, making them statistically more probable than others. For example, in a sunny country like Singapore, the chances of a sunny day are much higher than the chances of a rainy day or cloudy day, ceteris paribus. The Normal Distribution helps us represent such events. To represent a RV from a Normal Distribution, we denote it as \(X \sim N(\mu, \sigma^2)\). There are two parameters: \(\mu\) is the average/mean/mode value the RV can take while \(\sigma\) is the variance of the event (i.e., how spread away is it from this mean?).

In the next chapter, we cover the technical and implementation-specific details of the Normal Distribution and how it’s used in Diffusion Models.

Citation

BibTeX citation:

@misc{anand2022,
  author = {Anand, Rishabh},
  title = {Random Variables},
  date = {2022-08-01},
  url = {https://magic-with-latents.github.io/latent/posts/ddpms/part1/},
  langid = {en}
}

For attribution, please cite this work as:

Anand, Rishabh. 2022. “Random Variables.” The Latent (blog). August 1, 2022. https://magic-with-latents.github.io/latent/posts/ddpms/part1/.