7. The Central Limit Theorem

7.2 The Central Limit Theorem

Now we come to the very important central limit theorem. First, let’s introduce it intuitively as a process :

  1. Suppose you have a large population (in theory infinite) with mean \mu and standard deviation \sigma (and any old shape).
  2. Suppose you have a large sample, size n, of values from that population. (In practise we will see that n > 30 is large.) Take the mean, \overline{x}_{1}, of that sample. Put the sample back into the population[1]
  3. Randomly pick another sample of size n. Compute the mean of the new sample, \overline{x}_{2}. Return the sample to the population..
  4. Repeat step 3 an infinite number of times and build up your collection of sample means \overline{x}_{i}.
  5. Then[2] the distribution of the sample means will be normal will have a mean equal to the population mean, \mu, and will have a standard deviation of

        \[ \sigma_{\overline{x}} = \frac{\sigma}{\sqrt{n}} \]

    where \sigma is the population’s standard deviation. \sigma_{\overline{x}} = \sigma/\sqrt{n} is known as the standard error of the mean.

Now let’s visualize this same process using pictures :

  • Take a sample of size n from the population and compute the mean \overline{x} (see Figure 7.2a).


Figure 7.2a
  • Put them back and take n more data points.
  • Do this over and over to get a bunch of values for \overline{x}. Those values for \overline{x} will be distributed as shown in Figure 7.2b.


Figure 7.2b

The central limit theorem is our fundamental sampling theory. It tells us the if we know what the mean and standard deviation of a population[3] are then we can assign the probabilities of getting a certain mean \overline{x} in a randomly selected sample from that population via a normal distribution of sample means that has the same mean as the population and a standard deviation equal to the standard error of the mean.

To apply this central limit theorem sampling theory we will need to compute areas P under the normal distribution of means. In order to do that, so we can use the Standard Normal Distribution Table, we need to convert the values (\overline{x}) to a standard normal z using the z-tranformation as usual: z = \frac{\overline{x} - \mu}{\sigma_{\overline{x}}}. So, for the distribution of sample means the appropriate z-transformation is :

    \[  z = \frac{\overline{x} - \mu}{\sigma\sqrt{n}}\]

Example 7.1 : Assume that we know, say from SGI’s database, that the mean age of registered cars is \mu = 96 months and that the population standard deviation of the cars is \sigma = 16 months. We make no assumption about the shape of the population distribution. Then, what is the answer to the following sampling theory question: What is the probability that the mean age is between 90 and 100 months in a sample of 36 cars?

Solution : The central limit theorem tells us that sample means will be distributed as shown in Figure 7.3.

Figure 7.3 : Distribution of mean age from samples of 36 cars.

Convert 90 and 100 to z-scores as usual:

    \begin{eqnarray*} z(90) &=& \frac{\bar{x} - \mu}{(\sigma/\sqrt{n})} = \frac{90-96}{2.667} = -2.25 \\ z(100) &=& \frac{100-96}{2.667} = 1.50 \end{eqnarray*}

Then, the required probability using the Standard Normal Distribution Table is

    \begin{eqnarray*} P & = & A(2.25) + A(1.50) \\ & = & 0.4878 + 0.4332 \\ & = & 0.921 \hspace{.5in} (92.1\%) \end{eqnarray*}

  1. This is redundant since the population is infinite, but for conceptual purposes imagine that you return the items to the population.
  2. More precisely, the distribution of sample means asymptotically approaches a normal distribution as n \rightarrow \infty. But 30 is close enough to infinity for most practical purposes and the statistical inferential tests that we will study will assume that the distribution of sample means will be normal.
  3. In hypothesis testing we know what the mean of the population in the null hypothesis is.