Saturday, July 20, 2013

Estimating Population Mean Using Sample Data

Suppose we have a situation where we want to estimate the population mean when the population standard deviation is known. There are techniques that can be developed to accomplish this, but first there are some basic assumptions about the random variable from which we obtain a sample.

First, we have a simple random sample of size n which is drawn from a population containing x values. The value of the population standard deviation sigma, is known. The methods will work for any simple size n as long as x is normally distributed. When the distribution of x is unknown, we need a sample size of greater than or equal to 30 to use the normal approximation. But if the data is severely skewed, a larger sample size of 50 or greater is more appropriate.

A point estimate is an estimate of a population parameter given by a single number. Therefore, the sample mean is used as a point estimate for the population mean. But take note, that even with a large sample size, the sample mean x-bar is not exactly equal to the population mean mu. There is a margin of error, found by taking the absolute value of x-bar minus mu.

But when mu is not known, we cannot determine the margin of error. This is usually the case since the population mean is rarely known. If it was, there would be no need to use x-bar as a point estimate for mu.

The reliability of an estimate is determined by a confidence level. But what is a confidence level? For a confidence level x, there critical value Zc is the number such that the area under the standard normal curve between -Zc and Zc equals c. Remember there is typically a table with areas of a standard normal distribution at the back of a statistics textbook.

Using the table, if we want a level of confidence to be .7, the critical value is 1.04. For a .75 level of confidence, the critical value is 1.15. Typically we will want a confidence level to be .9, .95 or .99. The critical values for these levels of confidence are 1.645, 1.96 and 2.58, respectively.

Knowing how to find critical values for certain confidence levels will help determine confidence intervals for mu. A confidence interval for mu is an interval calculated from sample data so that c is the probability of generating an interval containing the actual value of mu.

For example, suppose John and Steve go to the track to jog. They like to jog 3 miles and from a random sample of 50 jogging sessions their mean time to complete the 3 miles is x-bar = 23.25 minutes and the standard deviation sigma = 2.5 minutes. Find a 99% confidence interval for mu, which is the mean jogging time for the entire distribution of their jogging sessions.

The critical value Zc = 2.58. From the Central Limit Theorem, we know that x-bar is approximately normally distributed. The confidence interval is x-bar +/- E, where E is the margin of error found by taking Zc(sigma/square root of n). So E in this problem is 0.912. Therefore the 99% confidence interval is 23.25 +/- 0.912. The lower limit of the interval is 22.338 minutes and the upper level of the interval is 24.162 minutes. The conclusion is that we are 99% certain that the interval contains the population mean time mu.

No comments:

Post a Comment