Small Sample CI Inferences: Example #2

Objective: This example shows how inferences are made about a single population mean for continuous data when the sample size is small. The t-test and the associated confidence interval are the standard tools for making statements about the population mean, but they require the data to be sampled independently from a normal distribution. This example goes through the steps of checking the assumptions and setting the confidence interval. Example #1 in the hypothesis testing module will revisit this problem to test a hypothesis about the population mean.

Problem Description: Data was collected on the breadth-to-length ratios of beaded rectangles used as decorations on leather goods by the Shoshoni American Indians. Are these rectangles consistent with the golden rectangles of the ancient Greeks (i.e., 1:0.618034)?

The histogram of the ratios shows that the data is positively skewed. Click on the right Animate button to increase the number of bins by 1. Then click on the Moments triangular reveal button to see the skewness value of 0.632.

The histogram indicates non-normality, but the histogram can give different appearances depending on how many bins are used and how the bin boundaries are selected. Likewise, the value of the skewness coefficient can result from many different distributional shapes. The assumption of normality can be examined with less ambiguity by constructing a normal quantile plot of the ratios. Click on normalPlot from the Histogram menu to display the normal quantile plot. The positive skewness is apparent by the upward concavity. Select Robust Fit from the QuantPlot popup menu to see the extent of the curvature. Then select Quantile Lines to show whether or not outliers are present. As can be seen, there are two outlier ratios (#10 and #20), one of which is extreme (#20). Actually, the data closely fall along a line except for the highest three values which increasingly deviate form linearity.

Despite the lack of normality, we will proceed with inferences based on the normal assumption, but we must be cautious in our interpretation. We will set a confidence interval on the population mean ratio. This can be done by clicking on the Confidence Interval reveal button on the histogram plot.

The 95% confidence limits are given in the text report in the bottom margin and they are graphically displayed in the histogram by a red confidence band. This confidence interval is constructed as the sample mean plus/minus the t-value times the standard error of the mean.

These moment summaries, shown in the top histogram above, help to clarify the interpretation of the confidence band. The sample mean is 0.66 which is graphically represented by the center line of the normal density. Note that the sample mean line passes through the center of the 95% confidence band. The location and length of the confidence band is compromised by the skewness (and outliers) of the distribution. Specifically, the sample mean is larger than the sample median of 0.641, which is seen in the normal quantile plot. The outliers not only increase the mean, but they also increase the standard deviation, which in turn lengthens the confidence band.

The golden rectangle ratio of 0.618 is within the 95% confidence band, which shows it is a plausible value. However, 0.618 is just inside the interval and clearly we must be careful in stating our conclusions. Due to violations of the assumptions, the evidence is inconclusive. The outliers increase the sample mean relative to the value of 0.618. On the other hand, the outliers increase the width of the confidence band, which makes it easier for the interval to cover 0.618. The confidence level can be changed by selecting other menu items from the Level popup menu. A 99% confidence interval gives more confidence at the expense of a wider interval. The golden rectangle is consistent with a 99% interval, but not with a 90% confidence interval.

Since the normal assumption is violated, it would be prudent to find a normalizing transformation of the data values prior to making inferences. Alternatively, a nonparametric approach can be used or the outlying values can be deleted. We will examine this latter option.

A histogram of the ratios with the two outliers removed is shown in the plot below.

Skewness is still present and the distribution is bunched somewhat into two groups as can be seen by changing the number of bins in the histogram or by examining the normal quantile plot. However, the normal quantile plot shows that outliers are no longer present. The conclusions are now somewhat stronger since 0.618 is in the confidence interval corresponding to any reasonable confidence level.