In statistics, a measure of central tendency is a value used measure position in
the center of the data. This measure then is used to describe the set of data by
the central value. The measure of central tendency most familiar to people is
the mean, which is the average value of the set of data. Other measures of
central tendency are the median and the mode.
All three measures may be
used to describe a set of data, but under certain circumstances, one measure
will be better suited to use than the others. What are the definitions of mean,
median, and mode and what circumstances dictate which is most appropriate to
use?
Mean
The mean is the most commonly used measure of
central tendency. One can think of the mean as the "average", which is found by
adding the values of the data and dividing by the number of values. For example,
if the set of data is {2, 5, 6, 8, 8, 10, 11, 11, 12} then the mean is (2 + 5 +
6 + 8 + 8 + 10 + 11 + 11 + 12)/9 = 8.1.
The mean calculated above is the
sample mean, which differs from the population mean. Many times the mean will
not include one of the values in the data, but in its calculation, uses every
value of the data set. A disadvantage of the mean is it is affected by outliers,
which are values that are much larger or much smaller than the rest of the data
values.
Median
The median is the center value in the set
of data when the data is arranged from largest to smallest or smallest to
largest. For example, in the above data, the middle value is 8. Suppose we added
the value 9 to the data, the set would look like {2, 5, 6, 8, 8, 9, 10, 11, 11,
12}. Then the middle values are 8 and 9. The values 2, 5, 6, and 8 would fall
below and 10, 11, 11, and 12 would fall above. Then the median is the average of
the middle values, which would be (8 + 9)/2 = 8.5.
An advantage of the
median is it is not affected by outliers. For example, if the data set above had
a value of 99 instead of 12, the median would still be 8.5.
Mode
The mode is the value in the data set that occurs most frequently. In
the data set used above, the mode is 8 and 11, since 8 and 11 both occur twice
in the set.
The mode is generally not used with continuous data, such as
time, weight. For instance, when comparing data for time of competitors in a
marathon, it's extremely unlikely that two runners will have the exact same
time.
Another problem with the mode as a measure of central tendency is
when the mode is a value that is not close to the rest of the data. For example,
if the data set is {1, 5, 7, 8, 10, 11, 44, 44, 44}, the mode is 44. But that is
obviously not representative of the center of the data set.
Which
measure of central tendency is best to use?
The only situation where
it's best to use the mode is when the dealing with nominal data. This would be
data in which there are categories and the number of data values are the
frequency under each category. For example, if we are going to classify where
people live in a certain state by city. Another example would be a car dealer
classifying their types of cars into categories such as sports cars, vans, mid
size sedans, etc.
The mean is best used with interval or ratio data that
is not skewed. Data is skewed when a large number of values tend to be on the
upper end or lower end of the data. When the data is skewed or dealing with
ordinal data, then the median is best used.
If the distribution is
normal, which is symmetric about the mean, then then mean, median and mode will
be identical. The normal distribution is represented by the classic bell-shaped
curve.
Having read this article, understanding the measures of central
tendency on which are best to use should be much more clear to those who had
confusion on this topic.
No comments:
Post a Comment