^ Objective:This module allows you to calculate quantiles on a batch of numbers. The quantiles of principal interest are the median and the quartiles. These summary measures are most commonly used for distributions which are skewed. Skewness can be assessed visually by histogram views of the data.
The f quantile, qf, of a dataset is a value with an approximate fraction f of the data less than or equal to qf. A range of values may satisfy the definition and, in this case, an interpolated value is used.
The sample median, corresponding to f = 0.5, is a measure of the center of a distribution. It is the middle value of the ordered data. The lower and upper quartiles correspond to f = 0.25 and f = 0.75, respectively. The interquartile range, the difference between the upper and lower quartiles, is a measure of variation.
A rule is needed to compute the quantiles. Let x(i) be the i th largest ranked value. For example, x(1) is the smallest value and x(n) is the largest value. Define x(i) as the fi quantile, where fi = i/(n + 1). The quantile plot (shown below) is constructed by graphing x(i) versus fi.
Consider the normally distributed data first introduced in the Histogram module. The sample quantiles are displayed numerically and graphically in the Histogram Java applet below.
The Quantiles button in the Report panel is chosen by default. The sample median and quartiles are displayed on the right along with other summary statistics. The sample median and quartiles can be visualized by choosing Quantiles lines from the Options menu. The center black line is the sample median and the adjacent black lines are the quartiles. Outlier lines are drawn in red.
The quantile plot can be used to display the quantiles for f values between 1/(n + 1) and n/(n + 1). Linear interpolation is used as required. Select Quantile values from the Options menu. By default, the median is given by the blue lines (corresponding to f = 0.5). Any quantile within the f range can be displayed by clicking the x symbol and dragging it to the desired location.Example #1 uses the New Jersey county area dataset to compute and interpret quantile-based summary statistics.
Example #2 uses the heights for 40 randomly selected students at Oxford University to calculate various quantiles including the median. The interquartile range is computed as a measure of spread. Quantiles are revealed in the bottom part of the normal quantile plot. The dependence of the height quantiles on gender can be obtained from a conditioning normal quantile plot.
Exercise #1 uses the faculty salary dataset to compute and interpret quantile-based summary statistics.
Exercise #2 uses the specific powers of US fighter aircraft to calculate the various quantiles including the median. The dependence of the power quantiles on the ability of the aircraft to land on a carrier can be obtained from a conditioning normal quantile plot.