## The Central Limit Theorem: Example #1

Objective: This example explores how the mean, standard deviation, and shape of the distribution of the sample mean is related to the mean, standard deviation, and shape of the parent distribution.

Problem Descripption: Sampling is simulated from a normal distribution and from a positively-skewed Chi-square distribution to see how the sample size affects the sampling distribution of the mean.

Initially, we will assume the parent distribution is the standard normal. Display the sampling distribution of the sample mean based on a sample size of n = 5 (the default). The parent distribution is revealed by pressing on the Distribution popup menu (currently set to the normal by default) in the upper-left corner and selecting Parent.

Both the parent and sampling distribution are normal and have a mean of 0, as indicated by the vertical red line. The standard deviation of the sampling distribution of mean is 0.4472 (1/sqrt(5)). The mean (mu) and standard deviation (sigma) of the parent are given as the values of animation controls in the upper margin (initially set to 0 and 1, respectively). The mean and standard deviation of the sampling distribution are printed in the upper-right margin of the graph. These latter values change as the parent parameters are changed. Both the parent and the sampling distribution of the mean have vertical lines drawn at their common mean plus/minus one standard deviation, respectively.

Increase the sample size to 16 by pressing on the right triangular button of the sample size (n) animation control. You will need to click on the Scale/Sample button to rescale the graph. The sampling distribution of the mean remains normal with a mean of 0, but the standard deviation has been reduced to 0.25 (1/sqrt(16)). Examine the parent and sampling normal curves to see the relationships between their mean and standard deviations.

Change the mean of the parent distribution to 10 by using the mu animation control. This can be done by clicking on the central part of the mu animation control and entering 10 for the mean. The parent and sampling distribution centers shift to 10, but otherwise remain the same. Now change the standard deviation of the parent distribution to 2 using the sigma animation control. The right-triangular button of the sigma control can be pressed, or the central part can be clicked and the value of the standard deviation entered. The location remains the same, but the scale is changed.

We now have a normal parent distribution with a mean of 10 and a standard deviation of 2. Since the sample size is 16, the standard deviation of the sampling distribution of the mean is 0.5 and its mean is 10. We now want to compute the probability the sample mean will be close to population mean of 10. Specifically, let`s compute the probability the sample mean is between 9.5 and 10.5, i.e., within 1 standard deviation of the true mean.

The plot of the standard normal distribution is the starting point for computing normal probabilities. Change mu to 10 and sigma to 0.5. Then select a <= x <= b in the P popup menu and set the lower limit to 9.5 and the upper limit to 10.5. As expected, the probability is 0.6827, which is the probability of being within one standard deviation of the mean for a normal distribution.

What is the probability a single observation sampled from the parent distribution is between 9.5 and 10.5? Change sigma to 2 in the normal density plot. The probability is only 0.1974.

Thus the sample mean based on a sample size of 16 is far more like to be close to 10 than a single value.

You have been examining the theoretical sampling distribution of the mean, which is normal when the parent is normal, irrespective of the sample size of the mean. We will verify this by sampling repeatedly (200 times) from the parent. Each of the 200 samples is based on a sample of size 16. Go back to the original graph for the sampling distribution of the mean. A histogram of the 200 mean values is displayed by selecting Histogram from the Distribution popup menu.

Notice that the histogram closely fits the theoretical sampling distribution of the mean. Repeated simulations of size 200 each based on sample sizes of 16 can be constructed by clicking on the Scale/Sample button. If both the number of simulations and the number of histogram bins are increased, the histogram would increasingly approximate the theoretical sampling distribution of the mean.

We will now examine the sampling distribution of the mean when the parent distribution is not normal. In the Distribution popup menu of the sampling distribution plot (the original plot), select Chi-square, which has a default of 5 df. If necessary display the parent distribution curve and rescale. If necessary, toggle off the Histogram option in the Distribution popup menu. The Chi-square parent is positively skewed as can be seen below.

The sampling distribution of the mean based on mean sample sizes of 3 (the default) is still positively skewed, but less so than the parent. This is a consequence of the central limit theorem.

The population mean of a Chi-square distribution is equal to its degrees of freedom, i.e., mu = 5 in our case. As can be seen both the parent and sampling distributions have a mean of 5 (the red vertical line) as expected. The population standard deviation is the square root of two times the df, i.e., sigma = 3.16 (sqrt(10)). The standard deviation of the sampling distribution of the mean thus is 1.8257 (3.16/sqrt(3)) as can be seen in the graph above.

We now want to verify that the sampling distribution of the mean approaches normality as the sample size of the mean increases. Change n, the sample size of the mean to 16 and rescale if necessary. Notice that the sampling distribution of the mean is much more concentrated around mu = 5 than the parent distribution and that it is nearly normally distributed.

If the plot is examined carefully, the slight lack of normality is evident, e.g., the vertical standard deviation lines are not of the same height.

Further increase n to 25. Normality is now closer and the sampling distribution of the mean is more concentrated about 5.

Generally, approximate normality of the sampling distribution of the mean is considered to be achieved when the sample size of the mean is 25 to 30, depending on how far the parent distribution deviates from normality. In our case, the parent distribution is not strongly skewed.

The probability the sample mean will be close to mu = 5 (within 0.6 which is approximately 1 standard deviation) can be calculated approximately. This will be 0.6827 as before since we are computing the probability of being within one standard deviation of the mean. On the other hand a single value from a Chi-square with 5 df is not likely to be so close to 5. From the distribution plot (the second plot), select Chi-square from the Distribution popup menu. Change df to 5 and select a <= x <= b from P. Set the lower limit to 4.4 (5 - 0.6) and the upper limit to 5.6 (5 + 0.6). The probability a single value is between 4.4 and 5.6 is only 0.1463 as seen below.

As approximation to the actual sampling distribution can be viewed by obtaining 200 random samples each of size 25 and plotting the 200 means in a histogram. Go back to the original sampling distribution graph and select Histogram from the Distribution popup menu. As seen below, approximate normality is supported by examining the shape of the histogram.

Sample several more times by clicking on the Scale/Sample button.