Some of the most difficult concepts to understand in statistics are the
various types of sampling. Specifically it can be confusing to
distinguish between stratified sampling and cluster sampling. I have
used the following explanations of these sampling techniques during my
13 years experience as a math tutor.
A simple random sample is
the most common type of sampling technique used in statistics. However,
designs that are used to sample populations across large areas, are more
complex than the simple random sample. In some instances, populations
are divided into homogeneous groups, called strata. Then a simple random
sample is selected from each strata. This kind of sampling is known as
stratified sampling.
The question that often arises is, "why
would we want to make things more complicated by using a stratified
sample?" Suppose we want to learn about fundraising for a high school
baseball team. The school is 55% boys and 45% girls, and we expect that
boys and girls have different ideas on the fundraising. If a simple
random sample is used to choose 200 students, we could possibly get 130
boys and 70 girls or 45 boys and 155 girls. Because of this, the amount
of variability could be large. So to reduce the variability, you can
sample 55% boys and 45% girls. This kind of "forced representative
balance" will ensure that the percentage of boys and girls in the sample
is identical to that in the population. This is a better method that
the simple random sample because it should give a more accurate
representation of the opinion of all the students in the school.
Now
suppose we want to find out what the high school freshmen think about
the food served in the cafeteria. We could use the simple random sample
or stratified sampling but it's too time-consuming to find every student
that was selected in the sample. But the freshmen homerooms are all in
one of ten rooms on the ground floor of the school. So, we could sample
two or three homerooms and sample every student in those homerooms. The
population was divided into representative clusters and a few clusters
were sampled in their entirety. This type of sampling is called cluster
sampling.
What is the difference between stratified and cluster
sampling? Clusters are heterogeneous and resemble the population in its
entirety. Stratified sampling is done to make sure the sample represents
different groups in the population, and samples are taken randomly
within each strata. Clusters are chosen to make sampling more affordable
or practical for the given situation.
An example which will more
clearly display the differences in the two types of sampling is
examining a pizza. Suppose you have a professional taster whose job is
to check each pizza for quality. Samples need to be eaten from selected
pizzas, with the crust, sauce, cheese and toppings tested.
You
could taste a slice of pizza as a customer would eat a slice. When doing
so, you'll learn about the pizza as a whole. The slice would be a
cluster sample since it contains all the ingredients of the pizza.
If
you select some tastes of the crust at random, of the cheese at random,
of the sauce at random, and of the toppings at random, you will still
get a pretty good judgment of the overall quality of the pizza. This
kind of sampling would be stratified.
Cluster samples slice
across the layers to obtain clusters, while stratified sampling
represent the population by selecting some from each layer, which
reduces the amount of variability.
This guide should help students better understand the differences between stratified sampling and cluster sampling.
No comments:
Post a Comment