Normal Quantile Plots

Contents: Objective Basic Principles Examples Exercises

^ Objective

This module examines a batch of numbers (i.e., the data) to determine if they are normally distributed. The techniques are graphical and informal.

^ Basic Principles

The normal distribution is the most used distribution in statistics. The principal reasons are:

  1. Normality arises naturally in many physical, biological, and social measurements.
  2. Normality is important in statistical inference.

The qth sample quantile is a value along the measurement scale with a proportion q or less of the data less than the qth quantile and a proportion 1-q or less greater than the qth quantile. Because of the discreteness of the data, the proportion of the data less than the qth quantile typically will not be exactly q. Important special cases are the quartiles and the median. Approximately one-fourth of the data is less than the lower quartile and three-fourths are less than the upper quartile. The median is the second quartile and has about one-half of the data below it. These definitions are discussed more fully in the Quantiles module.

The normal quantile plot implemented in JavaStat incorporates the features of a box and whisker plot. Specifically, options are available for showing the quartiles and the outlier cutoffs.

Consider the following data which was generated randomly from a normal distribution (see the Normal Distribution module) with a mean of 10 and a standard deviation of 2:

12.61, 13.07, 7.36, 8.26, 10.98, 8.43, 11.53, 10.10, 8.89, 8.66, 12.08, 11.26, 11.41, 9.78, 7.23, 13.81, 10.62, 10.49, 8.81, 10.58, 10.60, 10.93, 12.20, 8.28, 6.97, 8.12, 9.06, 10.49, 3.84, 12.05
This list of numbers does not give us much insight into the underlying normal probability law that generated the data. However, a graphical view of the data, as seen below in the Normal Plot of the Histogram applet, provides rich visual content. If the data is normally distributed, the data will fall approximately along a straight line.

The Normal Plot can be enhanced by items in the Options menu. Select Robust Fit from the Options menu to fit a line to the data. If the data is normally distributed it should fall along this line. It appears that the data is normally distributed except possibly the value in the lower left corner.

Quantile lines can be superimposed on the plot by selecting Quantiles lines from the Options menu. If an outlier is present, red outlier lines are drawn. Since no red lines are visible, the value in the lower left corner is not an outlier.

^ Examples

Example #1 examines the distribution of student distances from Oxford to determine if the distribution is normally distributed using a histogram and normal quantile plot. Various normalizing transformations are explored.

^ Exercises

Exercise #1 examines the distribution of FFD from the Aircraft dataset to determine if the distribution is normally distributed using a histogram and normal quantile plot. Various normalizing transformations are explored.