Monday, February 11, 2013

In statistics, a measure of central tendency is a value used measure position in the center of the data. This measure then is used to describe the set of data by the central value. The measure of central tendency most familiar to people is the mean, which is the average value of the set of data. Other measures of central tendency are the median and the mode.

All three measures may be used to describe a set of data, but under certain circumstances, one measure will be better suited to use than the others. What are the definitions of mean, median, and mode and what circumstances dictate which is most appropriate to use?

Mean

The mean is the most commonly used measure of central tendency. One can think of the mean as the "average", which is found by adding the values of the data and dividing by the number of values. For example, if the set of data is {2, 5, 6, 8, 8, 10, 11, 11, 12} then the mean is (2 + 5 + 6 + 8 + 8 + 10 + 11 + 11 + 12)/9 = 8.1.

The mean calculated above is the sample mean, which differs from the population mean. Many times the mean will not include one of the values in the data, but in its calculation, uses every value of the data set. A disadvantage of the mean is it is affected by outliers, which are values that are much larger or much smaller than the rest of the data values.

Median

The median is the center value in the set of data when the data is arranged from largest to smallest or smallest to largest. For example, in the above data, the middle value is 8. Suppose we added the value 9 to the data, the set would look like {2, 5, 6, 8, 8, 9, 10, 11, 11, 12}. Then the middle values are 8 and 9. The values 2, 5, 6, and 8 would fall below and 10, 11, 11, and 12 would fall above. Then the median is the average of the middle values, which would be (8 + 9)/2 = 8.5.

An advantage of the median is it is not affected by outliers. For example, if the data set above had a value of 99 instead of 12, the median would still be 8.5.

Mode

The mode is the value in the data set that occurs most frequently. In the data set used above, the mode is 8 and 11, since 8 and 11 both occur twice in the set.

The mode is generally not used with continuous data, such as time, weight. For instance, when comparing data for time of competitors in a marathon, it's extremely unlikely that two runners will have the exact same time.

Another problem with the mode as a measure of central tendency is when the mode is a value that is not close to the rest of the data. For example, if the data set is {1, 5, 7, 8, 10, 11, 44, 44, 44}, the mode is 44. But that is obviously not representative of the center of the data set.

Which measure of central tendency is best to use?
The only situation where it's best to use the mode is when the dealing with nominal data. This would be data in which there are categories and the number of data values are the frequency under each category. For example, if we are going to classify where people live in a certain state by city. Another example would be a car dealer classifying their types of cars into categories such as sports cars, vans, mid size sedans, etc.

The mean is best used with interval or ratio data that is not skewed. Data is skewed when a large number of values tend to be on the upper end or lower end of the data. When the data is skewed or dealing with ordinal data, then the median is best used.

If the distribution is normal, which is symmetric about the mean, then then mean, median and mode will be identical. The normal distribution is represented by the classic bell-shaped curve.

Having read this article, understanding the measures of central tendency on which are best to use should be much more clear to those who had confusion on this topic.