The Sample Mean / IID setting

Reading

Practice Problems

4.6.1 (Page 203)
1, 2, 3, 4, 5
4.6.4 (Page 214)
33, 34, 35, 36

Notes

The IID setting

The IID setting is somewhat analogous to the binomial setting in the case where the values in question are scalar. Here are its characteristics:

  1. Fixed number of trials. As in the binomial setting, we are repeating something a fixed number of times. We will again denote that by \(n\). It is often called the sample size, as it is usually exactly what it is.
  2. Trials have numerical values as outcomes. As such we can describe them as random variables. for instance we might be selecting students at random and looking at their gpa. We will denote by \(X_1\), \(X_2\), \(X_3\), , \(X_n\) the random variables for each of the trials.
  3. The distributions of the different trials are all identical. In other words, the “tables” for \(X_1\), \(X_2\), and so on are all identical. We often use \(X\) to denote that common distribution. In the example with the GPAs, this means that the kinds of values we can get when we look at the GPA of the first student we pick are the same as the kinds of values we can get when we look at the second student we pick and so on. In a sample case, this basically means that all samples are drawn from the same population.
  4. The trials are independent of each other. In our example, it would mean that what gpa we get for the first say 5 students does not have an effect on the gpa we might get for the 6th student. The checks we need to perform for this independence are the same checks we had to perform to determine if the trials in a binomial are independent. In particular, in the case where we have an actual population and removing people from it to form the sample, then we need to know that the population is at least 20 times the sample size.

We can summarize this by saying:

In the IID setting we have a fixed number of Independent, Identically Distributed trials

You should see a lot of similarities with the binomial. In the case of the binomial we had the same chance of success, \(p\). The analog of this here is the claim that the distributions of all trials are identical.

Sample Mean

In the IID setting, the quantity of interest is the sample mean:

\[\bar x = \frac{X_1+X_2+\cdots+X_n}{n}\]

Notice that it is a random variable, the sum of all the \(X_i\). Its value depends on the sample we end up with, just like the value of \(\hat p\) depended on the sample in the binomial case. So different samples would give us different values for the sample mean.

Just like in the binomial we were interested in the kinds of values that \(\hat p\) can take, and how likely each is, we can do the same thing here.

Sampling Distribution

The sampling distribution of \(\bar x\) is the distribution of the values that \(\bar x\) takes across all possible samples of size \(n\).

The remarkable fact is that we can describe what this distribution is, even if we know very little about the values that the \(X_i\) can take. Let us set up the stage.

In the IID setting we draw \(n\) samples/trials from a distribution \(X\). We will denote the mean of that distribution by \(\mu\) and the standard deviation of this distribution by \(\sigma\).

Then we can compute the mean and standard deviation of the sampling distribution of \(\bar x\):

\[\mu_{\bar x} = \mu\] \[\sigma_{\bar x} = \frac{\sigma}{\sqrt{n}}\]

The greek letters on the left-hand side denote the mean and standard deviation of the random variable \(\bar x\), in other words the mean and standard deviation of the sampling distribution.

Let us rephrase this:

Suppose we draw samples of size \(n\) by drawing independent values from a population \(X\) with mean \(\mu\) and standard deviation \(\sigma\).

If we then compute the sample mean values \(\bar x\), one for each possible sample of \(n\) values, then the mean of these values is \(\mu\) and the standard deviation is \(\frac{\sigma}{\sqrt{n}}\).

Sample averages vary less than the original values, by a factor of \(\sqrt{n}\).

Sample averages are on average the same as the original values.

This tells us at least the mean and standard deviation of the sampling distribution of \(\bar x\). Amazingly, we can say more about it. This is the famous Central Limit Theorem.

The Central Limit Theorem

Central Limit Theorem

When the sample size \(n\) is “sufficiently large”, then the sampling distribution of \(\bar x\) will be approximately normal.

So we can assume that \(\bar x\) follows the distribution:

\[N\left(\mu, \frac{\sigma}{\sqrt{n}}\right)\]

This is a remarkable theorem. It tells us that no matter what kind of distribution our original values had, heavily skewed, outliers, multiple modes and so on, then once we take large enough samples, the possible values are behaving like a normal distribution. No matter what we started with.

This is the reason why we have standardized tests. Your score in a standardized test is an average of your scores in many questions, and averages tend to behave in a more normal way than the original values.

The only thing left is to answer the question of what is “sufficiently large sample size”. All we have is a general rule of thumb, but the bottom line is: The more non-normal the original population, the larger the sample size you would need.

Rule of Thumb for sufficient sample size for the Central Limit Theorem to apply.

The larger the sample size, the better. These are starting points depending on the population.

One important observation is that these sample sizes are just the minimums required to be able to claim that \(\bar x\) is normally distributed. We typically still need even bigger sample sizes, in order to keep \(\sigma_{\bar x} = \frac{\sigma}{\sqrt{n}}\) small.

An example

Suppose we draw at random a sample of size \(40\) from the Hanover student body, and consider their GPAs.

Comparison Between Binomial and IID

Concept Binomial IID
Sample Size \(n\) (fixed) \(n\) (fixed)
Sample Values Yes/No Numeric
Distribution Same probability of Success \(p\) Identical \(X\) for each trial
Independent \(20\) times sample less than population And other considerations \(20\) times sample less than population And other considerations
Parameter (from population) Population percent of success (\(p\)) Population mean \(\mu\) (mean of \(X\))
Statistic (from sample) Sample percent of success (\(\hat p\)) Sample mean \(\bar x\)
Sampling Distribution mean \(\mu_{\hat p} = p\) \(\mu_{\bar x} = \mu\)
Sampling Distribution std. dev \(\sigma_{\hat p}=\frac{\sqrt{p(1-p)}}{\sqrt{n}}\) \(\sigma_{\bar x} =\frac{\sigma}{\sqrt{n}}\)
Sampling Distribution is Normal \(np\geq 10\) and \(n(1-p)\geq 10\) Central Limit Theorem (rule of thumb)