Random Variables and Distributions

Introduction

Definition of Random Variable

To simplify the complex problems into functional operations and unify their studys, we need to use the random variable.

Definition : Random Variable

A random variable (or r.v. for short) is a real-valued function defined on sample space (i.e. ), typically denoted by capital letters like .

Example

On Valentine’s Day a restaurant offers a Lucky Lovers discount. When the waiter brings the check, he’ll also bring the four aces from a deck of cards. He’ll shuffle them and lay them out face down on the table. The couple will then get to turn one card over.

If it’s a black ace, they’ll owe the full amount,
but if it’s the ace of hearts, the waiter will give them a $20 Lucky Lovers discount.
If they first turn over the ace of diamonds, they’ll then get to turn over one of the remaining cards, earning a $10 discount for finding the ace of hearts this time.

How to defined the Lucky Lovers discount as a random variable?

Solution. It is not difficult to obtain the sample space of the game of Lucky Lovers. Let H, D, C, S represent the aces of hearts, diamonds, clubs, and spades, respectively, we have: The Lucky Lovers discount can be defined as It is a mapping from the outcomes in the sample space to numbers on the real line.

We can observe that: with r.v.s defined, we simplify the study of random events to the study of r.v.s. The study of r.v.s. essentially involves examining all possible values that the r.v. can take and the probability associated with each value, this is known as the probability distribution. With the distribution, we can then grasp the overall certainty of the random event, providing a foundation for further study of its underlying regularity.

Based on the possible values a r.v. can take, they can be classified into:

Discrete random variable takes a finite or countable number of values.
Continuous random variable takes continuous values.

The differences and similarities in how the probability distributions of these two types of r.v.s are described:

Discrete r.v. can be described using a probability mass function (PMF).
Continuous r.v. can be described using a probability density function (PDF).
Both types of r.v.s can be described using a cumulative distribution function (CDF).

Definition : Probability Mass Function

Let be a discrete r.v., be the support of , i.e., the set of values that can take. Then the probability mass function (or PMF for short) of is defined as The function satisfies:

Non-negativity: .
Normalization: .

Example

Suppose that the support of a r.v. is and the PMF of is given by , where is some positive constant. Express the value of in terms of .

Solution. Using the normalization property of PMF, we have This implies that

Description of Probability Distribution

Definition : Continuous Random Variable and Probability Density Function

is called a continuous random variable if there exists a non-negative function , defined for all , satisfies that for any . The function is called the probability density function (or PDF for short) of .

Similar to PMF, the function satisfies:

Non-negativity: .
Normalization: .

It’s obvious that for any , we have . Thus, different from the PMF, does not reflecct the probability of taking the value of . Instead, reflects the degress to which the probability is concentrated around .

Intuitively, the larger is, the greater the probability that takes a value near .

Definition : Cumulative Distribution Function

For a r.v. , the cumulative distribution function (or CDF for short) is defined as

For a discrete r.v., the CDF is a step function .
For a continuous r.v., the CDF is a continuous function . Consequently, .
The CDF is non-decreasing and right-continuous.
The maximum of the CDF is .
The minimum of the CDF is .
For any real numbers , .

Example

Suppose that the lifespan in years of a certain household appliance is a r.v. with PDF given by What is the probability that an appliance will function between 1 and 2 years?

Solution. By the normalization property of the PDF, we have Let be the r.v. representing the lifespan of a computer, then

Expectation and Variance

Sometimes we need a simple, clear, distinctive description of a r.v. One of the most commonly used numerical characteristics are the mathematical expectation and variance.

Definition : Mathematical Expectation

The mathematical expectation (also known as the mean, expectation, or expected value) of a r.v. is denoted as and is defined as follows:

If is a discrete r.v. with PMF , given , then

If is a continuous r.v. with PDF , given , then

Simply put, the expectation is the weighted average of all possible values of a r.v. Expectation represents the average result or long-term value that we can anticipate from a series of random events.

Definition : Variance and Standard Deviation

If a r.v. satisfies that , then is called the variance of , and is called the standard deviation (or SD for short) of .

Specifically:

If is discrete with PMF , then
If is continuous with PDF , then More generally, the expectation of any function of is:

There are some basic properties of the expectation and variance:

For any constants , .
For any constants , , and thus .
, for any functions and .

Proof of property 3. w.l.o.g., show the case for a discrete r.v. with PMF .

Let , then

Common Discrete Distributions

Bernoulli Distribution

Definition: Bernoulli Trial and Bernoulli Distribution

If a random experiment only have two possible outocmes and , then the experiment is called a Bernoulli trial.

If a r.v. only takes values and , and , , then is called to follow a Bernoulli distribution with parameter , denoted as The PMF of can be expressed as The expectation and variance of is

The Bernoulli distribution is the foundation of many classical probability distributions, such as the binomial distribution, the geometric distribution, etc.

Binomial Distribution

Definition: -Fold Bernoulli Trial and Binomial Distribution

An -fold Bernoulli trial is an experiment that repeats a Bernoulli trial times independently. Note:

independently suggests that the result of each Bernoulli trial would not affect each other.
repeat suggests that the probability of event in each Bernoulli trial, i.e., , remains the same.

Let be a r.v. that records the number of times event happens in an -fold Bernoulli trial, then is called to follow a binomial distribution with parameter and , denoted as The PMF of can be derived as

The expectation and variance of can be derived to be Proof. By the definition of expectation, we have: The variance can also be derived similarly. More concise derivations are available once the independence between random variables is introduced.

Example

A factory has 80 pieces of the same type of equipment, each operating independently, with a failure probability of . A single maintainer can only repair one piece of equipment at a time. The factory is considering two strategies for allocating maintainers:

Allocate 4 maintainers, with each responsible for maintaining 20 pieces of equipment.
Allocate 3 maintainers, with them jointly responsible for maintaining all 80 pieces of equipment.

Please compare these two strategies in terms of the probability that a piece of equipment cannot be repaired in time when a failure occurs.

Solution. For the first strategy, let be the number of equipment that fail at the same time among the 20 pieces maintained by the four maintainers, respectively. Then Thus, the probability that a piece of equipment cannot be repaired in time is For the second strategy, let be the number of equipment that fail at the same time among the 80 pieces, then Thus, the probability that a piece of equipement cannot be repaired in time is So the second strategy is more optimal.

Geometric Distribution

Definition: Geometric Distribution

Suppose that a Bernoulli trial is repeated independently until occurs, let be a r.v. that records the number of trials required, then is called to follow a geometric distribution with parameter , denoted as The PMF of can be derived as The expectation and variance of is The geometric distribution is the only discrete distribution which has the memoryless property, i.e., for and any positive integers , we have

Poisson Distribution

Definition: Poisson Distribution

Let be a discrete r.v. with support , if it PMF is where is a constant, then is said to follow a Poisson distribution with parameter , denoted by

Poisson distribution is used to describe the number of events occurring in a fixed interval of time/space if the event occur with a constant rate and independently.

The expectation and variance of can be derived to be Proof. By the definition of expectation, we have: The parameter is also called the intensity of the Poisson distribution.

Example

A council is considering whether to base a recovery vehicle on a stretch of road to help clear incidents as quickly as possible. Records show that, on average, the number of incidents during the moring rush hour is 5. The council won’t base a recovery vehicle on the road if the probability of having more than 5 incidents during the morning rush hour is less than . Based on this information, should the council provide a vehicle?

Solution. Let be the number of incidents during the morning rush hour of a random day, then . The goal is to compute . Since: Then

Tip

The Poisson distribution can be obtained derived as the limit of the binomial distribution.

Proof. Consider the number of events within a unit time interval and divide it into subintervals: Several assumptions are made:

Let be large so that each subinterval is very short, making it impossible for two or more events to occur with the same subinterval.
The probability of an every occurring is proportional to the length of the subinterval, i.e., .
Whether an event occurs in a subinterval is independent of the others.

Let be the number of events within , by the assumption above, we have . So Let , it is not difficult to get This is known as the Poisson theorem.

Tip

The theorem suggests that for an -fold Bernoulli trial with large and small , the binomial distribution can be approximated by the Poisson distribution .

A rule of thumb is that when and , the Poisson distribution provides a good approximation to the binomial distribution.

Example

An insurance company has launched a life insurance policy where each participant is required to pay a premium of 2,000 from the insurance company.

Suppose 2,500 people participate in this insurance, and the probability of death for each person within the year is 0.002.What is the probability that the insurance company’s profit from this life insurance policy is no less than $20,000?

Solution. For the insurance company to make a profit no less than $ =5 P(X)=^{5_{k=0}=0.002}k0.998^{2500-k} P(X)^5_{k=0}e{-5} $$

In summary, we introduced the following discrete distributions:

Summary

Distribution	PMF	Expectation	Variance

Common Continuous Distributions

Uniform Distribution

Definition: Uniform Distribution

If the PDF of a r.v. is then is said to follow a uniform distribution on , denoted as or simply .

The CDF of is

The uniform distribution has a important property: This only depends on the length but not the position of the interval. This is known as the equal likelihood.

The expectation and variance of can be derived to be Proof:

Exponential Distribution

Definition: Exponential Distribution

If the PDF of a r.v. is then is said to follow an exponential distribution with parameter , denoted as .

The CDF of is

The exponential distribution can be used to describe the distribution of the time intervals between events in a Poisson process:

A Poisson process can be simply understood as a process where random events occur independently and with a constant rate along the time axis. The number of events occurring within a unit time interval follows , then the number of events occuring in follows .

Let be the time until the first event occurs, then which means that . Similarly, we can show that the time intervals between events independently follow .

The expectation and variance of can be derived to be Proof:

The exponential distribution is the only continuous distribution with the memoryless property, i.e., if , then for any we have Proof: For any , since , it follows that Then, by the definition of conditional probability:

Tip

The ceiling of an exponential r.v. follows a geometric distribution.

Normal Distribution

Definition: Normal Distribution

If the PDF of a random variable is then is said to follow a normal distribution (also known as the Gaussian distribution) with parameter and (), denoted as .

Specifically, is the standard normal distribution, with PDF The CDF of has no explicit expression, however, it is used very often and thus expressed as :

The PDF of is an elegant bell-shaped curve, symmetric about the parameter .

The larger the is, the more right the PDF is located at.
The larger the is, the flatter the PDF is.

If and define r.v. , then . We call this the standardization.

Proof: Consider the CDF of : By differentiation, the PDF of is given by which shows that .

The expectation and variance of can be derived to be Proof: Let , it suffices to prove , then

Example

A bus manufacturer is designing a bus. When determining the door height, they must ensure that it is not too high but also allows 99% of male passengers to pass through without bending. Assuming the height of all males follows a normal distribution , what should be the minimum door height to meet this requirement?

Solution. Let denote the height of a randomly selected male, be the door height of the bus. Then the requirement can be expressed as . Checking the probability table of normal distribution, we find that for , , so So

Summary

Distribution	PDF	Expectation	Variance

Transformation of Random Variables

Transformation of R.V.

For a discrete r.v. with PMF , it is not difficult to determine the PMF of : which consider both cases where is bijective and not bijective.

For a continuous r.v. with PDF , if is a discrete r.v., then the PMF of is Otherwise, if is also a continuous r.v., then if is a strictly monotonic function on the support of , and it has a continuously-differentiable inverse function , then the PDF of is

Proof for the continuous-to-continuous transformation. Consider the CDF of : . - If is a strictly increasing function, then and: - If is a strictly decreasing function, then and:

Example

Consider the time it takes to transfer a file over a network depends on the network speed , which vary due to traffic and other conditions and . Let denote the time required to transfer a 100Mb file, please derive the PDF of .

Solution. We have and Since is a strictly decreasing function on and its inverse function is continuously differentiable. So Also, since , then . Then we have

A famous application of r.v. transformation is based on the following results:

Tip

If the CDF of a continuous r.v. is and its inverse function exists. Define a r.v. , then .

On the other hand, if is the CDF of some r.v. and its inverse function exists, let , then for we have , i.e., the CDF of is .

Proof. Consider the CDF of : Since is a non-decreasing function and exists, then for any : This suggests that . The proof for the second claim is similar.

The second result can be used in the inverse transform sampling, which is a widely used technique for generating random samples from a complicated distribution.

If is not a strictly monotonic function on the support of , how to derive the PDF of ?

Example

Assume that r.v. , what is the PDF of ?

Solution. Since , the PDF of is Although is not a monotonic function, it is strictly increasing on and strictly decreasing on . For any , we have

The r.v. follows a distribution known as the chi-squared distribution.

Example

Solution. By the definition of expectation, we have

We can also calculate the expectation of using the PMF/PDF of directly:

Tip

If is a discrete r.v. with PMF , given , then If is a continuous r.v. with PMF , given , then

Alternative solution.

We generalize this kind of process to the relationship between and . To describe such rule, we need a few definition first.

Definition: Convex and Concave Function

A function is said to be a convex function, if for any and , we have A function is said to be a concave function, if for any and , we have

Jensen’s Inequality

Let be a random variable, then

for any convex function ,

for any concave function ,

By the Jensen’s inequality, we have

Example

One of the applications of Jensen’s inequality is related to the Kullback-Leibler divergence. KL divergence is called the information gain in the context of decision trees and also called the relative entropy.

Simply put, if we have two probability distributions and , the KL divergence measures the difference/distance between them: The KL divergence has the property that and iff almost everywhere. Try to prove the non-negativitiy of KL divergence by Jensen’s inequality.

Proof. Let be a r.v. with PDF , define another r.v. . Let function , then is a convex function, so that by the Jensen’s inequality: Since and So

Probability and Statistics for Engineering Lecture 3-5 Notes

Random Variables and Distributions

Introduction

Definition of Random Variable

Description of Probability Distribution

Expectation and Variance

Common Discrete Distributions

Bernoulli Distribution

Binomial Distribution

Geometric Distribution

Poisson Distribution

Summary

Common Continuous Distributions

Uniform Distribution

Exponential Distribution

Normal Distribution

Summary

Transformation of Random Variables