Moments

Contents

Objective
History
Sample Mean
Sample Standard Deviation
Examples
Exercises

^ Objective

This module shows you how to calculate and interpret the mean, standard deviation, and other related statistical numerical summaries for a batch of numbers. The statistical summaries are related to the moments of physical bodies. These summary measures are most commonly used for distributions which are approximately symmetric. Symmetry can be assessed visually by a histogram view of the data.

^ History

The center of gravity of an object, e.g., a car, is an important principle of mechanics first formulated by the Greek mathematician Archimedes. Using the same principles as in physics, the center of gravity of a histogram is the grouped arithmetic average or grouped sample mean. The grouped mean is computed by summing xi x fi over all class intervals, where xi is the center of the ith class interval and fi is the relative frequency of that class (i.e., fi = ni/n). This grouped sample mean approximates the simple arithmetic average. Likewise, the grouped sample variance (the square of the standard deviation) is proportional to the moment of inertia (a measure of the rotational inertia of an object).

Karl Pearson was the first to use the term moment as a descriptor for the sample mean and standard deviation based on the analogy between mechanics and statistics.

^ Sample Mean

The sample mean, denoted by x, is a measure of the center of a distribution. It is the arithmetic average of the variable values, i.e., x = S xi / n, where xi is the ith sampled value of the variable X.

As an example, a random sample of size 10 is drawn (with replacement) from the digits 1–10, i.e., each digit has the same chance (1/10) of being selected in each draw. The resulting values are:
4 8 2 10 1 5 2 4 4 7.
The sample mean is given by:
S xi / n = (4 + 8 + … + 7)/10 = 47/10 = 4.7 (Check this calculation by hand.)

Notice that several of the values occur more than once. If there are k unique values, the sample mean can be computed as S fixi / n, where fi is the number of times xi occurs and S fi = n. In this case,
((1)(1) + (2)(2) + (3)(4) + (1)(5) + (1)(7) + (1)(8) + (1)(10) / 10 = 47/10 = 4.7,
as expected. The sample mean computed in this way is called the grouped sample mean.

The sample mean is close to the expected value of X, which is 5, as will be seen later.

Now consider the following random sample of size 30 from a normal distribution with a mean of 10 and a standard deviation of 2.
12.61 13.07 7.36 8.26 10.98 8.43 11.53 10.10 8.89 8.66 12.08 11.26 11.41 9.78 7.23 13.81 10.62 10.49 8.81 10.58 10.60 10.93 12.20 8.28 6.97 8.12 9.06 10.49 3.84 12.05
The normal distribution corresponds to the well-known bell-shaped curve and will be discussed later. These resulting values are displayed in the following histogram:

The Moments button in the Report panel is chosen by default. The sample mean is displayed on the right along with other summary statistics. Based on the raw data values above, verify that that the sample mean is 9.95. The sample mean can be visualized by choosing Normal density from the Options menu. The position of the center red line along the measurement scale (at a position slighly less than 10) is the sample mean.

Now suppose that only the histogram is available—not the raw data. Is it still possible to compute the sample mean? Yes, at least approximately! If each value in a group is "rounded" to the midpoint of the group interval, the resulting values can be used to compute an average. For example, the 3 values in the 6–8 interval are all assigned a value of 7. In this case, we have:
((1)(3) + (0)(5) + (3)(7) + (9)(9) + (11)(11) + (6)(13)) / 30 = 10.13,
which is close to actual value of 9.95.

We are now in a position to discuss why the mean is also called the first moment. Consider each of the bars in the histogram to have some minimal uniform thickness. More ...

^ Sample Standard Deviation

The sample standard deviation, denoted by s, is a measure of variation and is based on the squared deviation between each value and the sample mean. More specifically, the standard deviation is the square root of the sample variance which is defined by:
s2 = S(xi - x)2 / (n-1).

For the 10 randomly sampled values from the digits 1-10 given above, the sample variance is given by:
s2 = S(xi - x)2 / (n-1) = ((4 - 4.7)2 + (8 - 4.7)2 + … + (7 - 4.7)2)/9 = 8.233
The sample standard deviation is thus (8.223)1/2 = 2.869.

Now consider the 30 values randomly sampled from a normal distribution which are displayed in the applet above. If the Normal density is still displayed in the Histogram applet, the distances (which are equal) to the adjacent red lines from the center red line corresponds to the standard deviation value of 2.14.

^ When to Use the Sample Mean and Standard Deviation

The mean and standard deviation are most useful when the distribution is nearly symmetric. Unfortunately, they can be greatly affected when the data is skewed toward the right or left.

The sample median is less affected by skewness than the sample mean, i.e., the position of the upper or lower values has little affect on the median (which is the middle value of a ranked dataset). On the other hand, the mean is pulled in the direction of skewness. For example, if the distribution is positively skewed, the mean will tend to be larger than the median.

^ Skewness Measure

Skewness measures the extent to which data deviate positively or negatively from symmetry. Various measure of skewness are used.

A simple measure of skewness is 3 times the mean minus the median divided by the standard deviation. A positive values indicates positive skewness and a negative value denotes negative skewness. A computed value greater than 1 (less than -1) denotes strong positive (negative) skewness.

^ Examples

Example #1 uses the New Jersey county areas to compute and interpret moment-based summary statistics.

Example #2 uses the heights for 40 randomly selected students at Oxford University to calculate the mean, standard deviation, and the measure of skewness. The dependence of the height moments on gender can be obtained from a conditioning histogram.

^ Exercises

Exercise #1 uses the faculty salary dataset to compute and interpret moment-based summary statistics.

Exercise #2 uses the specific powers of US fighter aircraft to calculate the mean, standard deviation, and the measure of skewness. The dependence of the power moments on the ability of the aircraft to land on a carrier can be obtained from a conditioning histogram.

Exercise #3 (Spider's Webs' Angles)

Exercise #4 (Psychiatric Admittances)