Example #1: New Jersey County Areas

Contents: Objective Problem Description Quantile Plot Quartiles Robustness

^ Objective

Numerically examine the sizes of New Jersey counties using the median, interquartile range, and related measures. Determine whether these measures give a good numerical description of the distribution.

^ Probelm Description

Values were obtained for the sizes of counties in the State of New Jersey. The following variables were recorded:

County: New Jersey county
Area: Size of county in square miles

The 21 areas constitute all possible values, i..e., these areas are not sampled from some larger population of counties in New Jersey. Therefore, only descriptive summaries and graphs are meaningful.

^ Quantile Plot

The areas of the 21 New Jersey counties are plotted in the quantile view of the Histogram applet below. This example displays and interprets sample quartiles and related statistics. The quantile statistics are displayed in the bottom border of the quantile plot. The sample median and interquartile range provide robust alternatives to the sample mean and standard deviation discussed in the Moments New Jersey county areas example.

^ Quartiles

The median county size is 329.0 sq mi and the IQR is 263.5 sq mi. Click on the Data button to reveal the data. The median value (the 11th ranked value) is the area of Gloucester County. The sample median is somewhat less than the sample mean value of 358.1 sq mi, which is consistent with the effects of positively skewed data.

The lower quartile is 224.5 sq mi, which is computed as the (21 + 1)/4 = 5.5 ranked value. In this case, it is the average of the areas of Camdem and Mercer counties. Likewise, the upper quartile is 488.0 sq mi, which is the average of the 16th and 17th ranked areas (those of Monmouth and Cumberland counties).

The median and the quartiles can be visualized in the quantile plot view of the Histogram applet. Select Quantile lines from the Histogram menu. The lower line is the lower quartile; the middle line is the median; the upper line is the upper quartile. The y-values of these lines correspond to the quartiles (the median is the second quartile) in the Quantiles report.

Arbitrary quantiles can be found by selecting Quantile values from the Options menu. For example, the quantile corresponding to f = 0.8 is 516.2 sq mi.

^ Robustness

The maximum county area of 819 sq mi is not an outlier as shown in the Quantile Plots Module. However, the maximum value is separated somewhat from the remaining data values and it has some influence on the sample mean, standard deviation, and skewness measure as discussed in the Moments Module New Jersey county areas example. These descriptive measures are sensitive to outliers and are not meaningful for highly skewed data.

The median and IQR are less sensitive to outliers and they are more meaningful for skewed data than moment-based statistical quantities. If the maximum value is removed, the sample median is reduced to 320.5 sq mi, only a slight reduction from the above calculation using the complete dataset. The (rounded) sample lower quartile is now 223 sq mi and the upper quartile is 474 sq mi, again only slight reductions from the quartiles computed from all of the data. The sample IQR is thus 251 sq mi as compared to 263.5 sq mi for all of the data.