Paired-Data CI Inferences: Example #1

Objective: This example shows how inferences are made about the difference between two population means when the data are paired or matched. The data are analyzed by applying one-sample methods to the paired differences. The paired differences need to be independently and normally distributed. This example goes through the steps of checking the assumptions and setting the confidence interval. Example #1 in the paired differences hypothesis testing module will examine this problem from a testing perspective.

Problem Description: Two virus preparations were soaked into cheesecloth and each was rubbed onto different halves of a tobacco leaf. The number of local lesions (small, dark rings) appearing on each half of each of eight leaves were recorded. Do the two extracts produce different effects?

We want to determine if there is a mean difference in the number of lesions produced by the two virus preparations. One half of a tobacco leaf receives virus preparation 1 and the other half receives preparation 2. The distribution of paired differences allows us to make inferences about µd = µ1 - µ2.

Preparation 1 appears to produce more lesions since only one difference is negative. The above histogram of count differences is symmetrically distributed except for a possible outlier. Click on the moments triangular reveal button to display the sample mean difference and standard deviation. On the average, preparation 1 has 4 more lesions than preparation 2.

The confidence interval on the mean difference requires that the paired differences be normally distributed. Select normalPlot from the Histogram popup menu to display the normal quantile plot of the paired differences. Select Robust Fit and Quantile LInes from the QuantPlot menu.

The first observation (Click on L in the tool pallet, highlight the outlier, and click red from the color pallet) is indeed an outlier. The remainder of the data is (at least approximately) normally distributed. We will proceed to construct a confidence interval despite the possible effects of the outlier.

The 95% confidence interval is computed and displayed by clicking on the Confidence Interval triangular reveal button.

The 95% CI ranges from 0.40 to 7.60. Since this interval does not contain 0 and is positioned above 0, we conclude that Preparation 1 produces more lesions than Preparation 2. Verify that a 99% CI does cover µd = 0 and thus if we require a high level of confidence we cannot distinguish between Preparation 1 and 2.

The 95% CI is wide. Although this is partly due to the small sample size, the principal reason is related to the high variability. For example, the coefficient of variation (the Std Dev/Mean x 100) is greater than 100%. It would be appropriate to ask how much the outlier influences our confidence interval. The distribution of paired differences with the outlier removed allows us to access the effect of the outlier.

The sample mean and standard deviation are substantially reduced relative to the original sample mean and standard deviation. Hence the width of the CI (~4.6 = 5.02 - 0.40 lesions) is less than that of the original CI (~7.2 lesions). The lower limit of both intervals is ~0.4 and thus 0 is not a plausible value, at least with 95% confidence.

The required assumption of normality is difficult to verify (visually) using a histogram. This is nearly always the case when n is small. The following normal quantile plot shows that the normal assumption is plausible.