A population refer to the entire set of individuals,
objects, events, or measurements that share a common characteristic and
are of interest in a particular study. Any numerical characteristic of a
population is a parameter. A sample is a subset of the
population, which is used to make inferences about the population.
When the form of the population distribution is known but contains
known parameters, inference about these parameters is referred to as
parametric statistical inference.
Generally, we are interested in some quantitative index of the population. The value of
for each individual in the
population may be different and these values collectively form a
probability distribution. So the population is usually expressed as a
random variable following a
specific distribution, i.e., or . In
statistics, the distribution of
is often unknown, but we can typically assume a specific form based on
context, leaving only certain parameters of the distribution unknown.
So the population can be expressed as or individuals are
selected from the population, their quantitative index are recorded as
, which is called a
sample of size , and
is the sample size. The
process of selecting individuals from the population is called
sampling. The goal is to make inference of the whole population
based on a sample. And this requires special attention during sampling
to ensure that the samples drawn are sufficiently
representative.
The simplest method to ensure representativeness is simple random
sampling. As its name suggests, the core of simple random sampling
is to select samples in a completely random manner, ensuring that every
individual in the population has an equal probability of being
chosen. Under this sampling method, are independent, and follow
the same distribution with . This
is a fundamental assumption of many statistical methods.
Before sampling, we have no idea who will be sampled; after sampling,
the observed values
of are obtained, and
those values aare called the sample observed values.
Definition: Statistic
If is a sample
from population and is an -ary function. Define random variable
then is called a statistic if does not involve any unknown
parameter.
Common Statistics
Sample Mean
Sample Variance and Sample Standard Deviation
Order Statistics
where
Sample -quantile
Number that exceeds at most of the sample and is exceeded by
at most of the sample
Specifically,
is the sample
median, () is the sample lower (upper)
quartile or the sample first (third) quartile, also
written as (). is called the sample
interquartile range, or sample IQR for short.
Each statistic is a random variable because it is computed fromrandom
data. After sampling, the observed value of a statistic can be
obtained.
The distribution of a statistics is called the sampling
distribution, which are required to construct confidence interval
and perform hypothesis testing latter.
Parameter Estimation:
Point Estimation
Properties of Point
Estimation
Definition: Point Estimation
Let be a simple
random sample from the population , are the unknown
parameters. If is a
statistic used to estimate , then is called an
estimator of .
Plugging in the sample observed values, is called an
estimate or estimated value of . Both and can be
abbreviated as .
Before introducing the methods of point estimation, we first
introduce some properties used to compare multiple point estimators.
Since is statistic,
i.e., a random variable, a natural criterion is to see if it
underestimate or overestimate on average.
Definition: Unbiasedness
Let be a simple
random sample from the population and is an
estimator of . is the parameter
space of ,
i.e., the set of values that can take. If for all , we have
then is called an
unbiased estimator of . Otherwise, it is called a
biased estimator of . is
called the bias of the estimator. If the bias is not , but converge to as , then is
called an asymptotic unbiased estimator.
Tip
No matter what is the population distribution, if the population mean
and population variance
exist, then
and are unbiased estimators
of and .
Proof. Since and
So Hence
Unbiasedness suggests that an estimator fluctuates around the true
parameter, considering the stability of estimation, we would like the
magnitude of the fluctuation to be as small as possible.
Definition: Relative Efficiency
Let be a simple
random sample from the population , and are two unbiased
estimators of . is the parameter space. If for
all , we
have
and
for at least one , then
is said to be
more efficient than .
The last property is about the convergence of an estimator as the
sample size approaches .
Definition: Consistency
Let be a simple
random sample from the population , and are two unbiased
estimators of . is the parameter space. If for
all and all
, we have
i.e., as
. Then is called a consistent
estimator of .
Tip
An (asymptotic) unbiased estimator may not be a consistent estimator.
A consistent estimator may not be an (asymptotic) unbiased
estimator.
Tip
If is an asymptotic
unbiased estimator of and
as , then is a consistent estimator of
.
Proof.
Chebyshev’s Inequality
Let be a random variable with
mean and variance both exists, then
Therefore, byb the Chebyshev’s inequality,
Tip
No matter what is the population distribution, if the population mean
and population variance
exist, then
and are consistent
estimators of and .
Proof. By the weak LLN, we have and
Method of Moments
Definition: Moments
Let be a simple
random sample from the population . The -th population moment and -th sample moment are defined as
The -th population central moment and
sample central moment are defined as ,
Then we can choose the values of s.t. the
population moments match the sample moments.
The population moments can be expressed as functions of the
parameters ,
and we set the population moments equal to the sample moments: The solution is For , is called the moment estimator of
.
Tip
No matter what is the population distribution, if the population mean
and population variance
exist, then
the 1st sample moment and the 2nd sample central moment are the moment estimators of and , respectively.
Method of Maximum Likelihood
The central idea of the method of maximum likelihood is to find the
parameter values that maximize the likelihood of observing the given
data.
For a simple random sample from the population , the sample
observed values are .
With different values of , the likelihood of observing
are
different.
We would estimate
with the values that maximize the likelihood of observing .
Definition: Likelihood Function and Maximum Likelihood Estimator
Let be a
simple random sample from the population and the sample
observed values are . is the parameter space. The
likelihood function is a function of , measuring the likelihood of
observing .
i.e., the joint PMF/PDF of .
If there exists
s.t. then is the
maximum likelihood estimate of , the corresponding estimator
is
the maximum likelihood estimator of . Maximizing is
equivalent to maximizing the -likelihood function
Example
is a simple random
sample from the population . Derive the maximum likelihood estimators of
the unknown parameters and
.
Solution. The likelihood function is The -likelihood
function is Then The solution is
Tip
Under mild regularity conditions, the moment estimators and maximum
likelihood estimators are consistent and asymptotic unbiased
estimators. And for large samples, a maximum likelihood estimator
has an approximately normal distributions. This property is known as
asymptotic normality. This property helps us construct interval
estimation for a parameter.
Parameter Estimation:
Confidence Interval
Given the sample observed values, a point estimate provides a
concrete estimated value of the parameter. However, the accuracy of this
estimate is not provided by the point estimate itself. To solve this
issue, we introduce the interval estimation.
The general idea of interval estimation is
Find two statistics
and use the random interval to estimate
the range of .
Since is a random
interval while is a fixed
value,
may not cover the true value of .
If the width of the interval is large, then it has higher
probability of covering the true value of , but its precision is relatively
low.
Therefore, we need to find a balance between the coverage probability
and precision. The confidence interval is a widely used
interval estimation.
Definition: Confidence Interval
Let be a simple
random sample from the population . For all , if there exists two
statistics and s.t. then is
called a confidence interval of with confidence level
, or simply a
confidence
interval. and
are called
confidence lower limit and confidence upper limit,
respectively.
Example
is a simple random
sample from ,
suppose that the value of
is unknown. Derive the confidence interval of
the unknown parameter .
Solution. A natural idea to construct a confidence interval
is to start from a point estimator and form an interval by adding and
subtracting a quantity around the point estimate. For this example, a
good point estimator of is
: it can be obtained by
both the method of moments and method of maximum likelihood, and it is
an unbiased estimator of .
By the definition of CI, we would like to
determine the value of , s.t. with an equivalent expression .
Obviously, the value of
need to be determined by referring to the distribution of , which is So can be
determined by where . The
satisfying the equation
above and with the minimum width of the interval is where is the
upper -quantile of
the standard normal distribution, whose value can be found in the table. Therefore, the CI of is With the value of , and observed values of the sample, the
confidence interval can be calculated.
Example
Following the above example, derive the CI of the unknown
parameter if is unknown.
Solution. Following the same rationale as the case when
is unknown, we still start
from and try to find
s.t. where is the consistent
estimator of . Then the value
of can be determined based
on the distribution of Unlike , no longer follows a standard normal
distribution. The exact distribution of is defined to be the Student’s
-distribution. And approximately follows the standard
normal distribution when the sample size is large.
Therefore, with
determined based on ,
the corresponding CI is called the large sample confidence
interval, which is
Construction of Confidence Interval: A General Method
Let be a simple
random sample from the population . is an
unbiased estimator of and
the standard deviation of is .
This is known as the standard error of . If exactly follows a normal
distribution, i.e., , and
does not
depend on any unknown parameter, then an exact CI of is If or
depends on unknown parameters and is
a consistent estimator of , then a
large sample CI of is
Title: Probability and Statistics for Engineering Lecture 12-14 Notes