Sunday, July 28, 2013

How to Work with Normal Distributions

used and important continuous probability distributions seen in statistics is the normal distribution. The normal distribution is characterized by a bell shaped curve with the highest point above the mean. The distribution is symmetrical and approaches the horizontal axis but never touches it. But how do we work with normal distributions? At the end of this article, you should be able to calculate probabilities for a random variable that has a normal distribution.

Suppose a random variable x has a normal distribution with a mean m and standard deviation s. To find areas and probabilities for such a random variable, convert x values to z using the formula z = (x - m)/s. Then use a table for the area of a standard normal distribution, found in most statistics textbooks, to find the corresponding areas and probabilities.

Consider the following example:

Let x have a normal distribution with m = 15 and s = 4. Find the probability that an x value chosen at random from this distribution is between 8 and 19.

In symbols, we write this problem as P(8 ≤ x ≤ 19).

Since the probabilities correspond with areas under the normal distribution curve, we can find the area under the curve from x = 8 to x = 19. To do this, we convert the x values to z values.

For x = 8, z = (8 - 15)/4, therefore z = -1.75. For x = 19, z = (19 - 15)/4, therefore z = 1. Writing this in symbols, we get P(-1.75 ≤ z ≤ 1), This is the same as thinking of the area to the left of z = 1 minus the area to the left of z = -1.75. These values can be found using a table for the area of a standard normal distribution. Using the table, we get 0.8413 - 0.0401 = 0.8012.

Suppose we want to know the probability that a value chosen at random is greater than a certain value? Using the same value for mean and standard deviation, suppose we want the probability that x > 12?

This is the same as one minus the probability of x less than or equal to 12. In symbols, this is 1 - P(x ≤ 12). Converting the x value to z value, we get z - (12 - 15)/4 = -0.75. The z value corresponding to -0.75 is 0.2266. Therefore, P(z > -0.75) = 1- .2266 = .7734.

This guide should give a student a basic understanding of how to calculate probabilities for a random variable that has a normal distribution.

Wednesday, July 24, 2013

Normal Approximation to the Binomial Distribution]

Sppose that the probability a new vaccine will protect adults from a certain disease is 0.8. The vaccine is given to 200 adults. What is the probability that more than 170 of those adults will be protected by the vaccine?

This question is in the category of a binomial experiment with n trials equal to 200 and the probability of success p equal to 0.8, and the number of successes r equal to 170. There is a formula for the binomial distribution which can be used to calculate the probability. But this would be very time consuming and easy to make some errors in calculation. But there is a simpler way to solve this problem, and that is by using the normal distribution to approximate the binomial distribution. However, certain conditions must be present in order to use this method.

First, we must consider the binomial distribution with n = number of trials, p = probability of success on a single trial, q = 1 - p = probability of failure on a single trial, and r = number of successes. Then if np > 5 and nq > 5, then r has a binomial distribution which you can approximate with the normal distribution. The mean is approximated by np and the standard deviation is approximated by the square root of npq. This approximation becomes more accurate as the sample size n increases.

Now we can put this all together with an example. Suppose the owner of a hotel needs to install new air conditioner units to 25 of the rooms. From past experience with a noted brand, he knows that the air conditioner unit is guaranteed for 5 years, and the probability that it will last 10 years is 0.35. What is the probability that 12 or more of the units will last more than 10 years?

In this problem, n = 25, p = 0.35, q = 0.65. We want the probability that r is greater than 10. We can use the normal approximation to the binomial since np = 8.75, which is greater than 5. Also, nq = 16.25, which is greater than 5. The mean is 8.75 and the standard deviation is square root of npq, which is 2.38. Using the normal distribution, we calculate the z to be (9.5 - 8.75)/2.38 = 0.32. Note that we use 9.5 instead of 10 because of a continuity correction, which converts r to a continuous normal random variable x by subtracting 0.5, since r in this case is the left-point of the interval. Now we find the probability that z > 0.32, which is 0.3745. This value is found by using a table for the areas of a standard normal distribution.

This guide should help assist students having difficulties understanding the normal approximation to the binomial.

Saturday, July 20, 2013

Estimating Population Mean Using Sample Data

Suppose we have a situation where we want to estimate the population mean when the population standard deviation is known. There are techniques that can be developed to accomplish this, but first there are some basic assumptions about the random variable from which we obtain a sample.

First, we have a simple random sample of size n which is drawn from a population containing x values. The value of the population standard deviation sigma, is known. The methods will work for any simple size n as long as x is normally distributed. When the distribution of x is unknown, we need a sample size of greater than or equal to 30 to use the normal approximation. But if the data is severely skewed, a larger sample size of 50 or greater is more appropriate.

A point estimate is an estimate of a population parameter given by a single number. Therefore, the sample mean is used as a point estimate for the population mean. But take note, that even with a large sample size, the sample mean x-bar is not exactly equal to the population mean mu. There is a margin of error, found by taking the absolute value of x-bar minus mu.

But when mu is not known, we cannot determine the margin of error. This is usually the case since the population mean is rarely known. If it was, there would be no need to use x-bar as a point estimate for mu.

The reliability of an estimate is determined by a confidence level. But what is a confidence level? For a confidence level x, there critical value Zc is the number such that the area under the standard normal curve between -Zc and Zc equals c. Remember there is typically a table with areas of a standard normal distribution at the back of a statistics textbook.

Using the table, if we want a level of confidence to be .7, the critical value is 1.04. For a .75 level of confidence, the critical value is 1.15. Typically we will want a confidence level to be .9, .95 or .99. The critical values for these levels of confidence are 1.645, 1.96 and 2.58, respectively.

Knowing how to find critical values for certain confidence levels will help determine confidence intervals for mu. A confidence interval for mu is an interval calculated from sample data so that c is the probability of generating an interval containing the actual value of mu.

For example, suppose John and Steve go to the track to jog. They like to jog 3 miles and from a random sample of 50 jogging sessions their mean time to complete the 3 miles is x-bar = 23.25 minutes and the standard deviation sigma = 2.5 minutes. Find a 99% confidence interval for mu, which is the mean jogging time for the entire distribution of their jogging sessions.

The critical value Zc = 2.58. From the Central Limit Theorem, we know that x-bar is approximately normally distributed. The confidence interval is x-bar +/- E, where E is the margin of error found by taking Zc(sigma/square root of n). So E in this problem is 0.912. Therefore the 99% confidence interval is 23.25 +/- 0.912. The lower limit of the interval is 22.338 minutes and the upper level of the interval is 24.162 minutes. The conclusion is that we are 99% certain that the interval contains the population mean time mu.

Monday, July 15, 2013

Understanding the Central Limit Theorem

In statistics, suppose x is a random variable with normal distribution with mean mu and standard deviation sigma. Now let x-bar be the sample mean from a sample of size n from the x distribution. The following assumptions can now be made:

1. The x-bar distribution is normally distributed, just like the x distribution.
2. The mean of the x-bar distribution is mu.
3. The standard deviation of the x-bar distribution is sigma divided by the square root of n.

This theorem states that the x-bar will be normally distributed if the x distribution is normal, no matter what the sample size is. The mean will always be the same as the mean of the x distribution and the standard deviation is always sigma divided by the square root of n.

What happens if we don't know the shape of the x distribution? The Central Limit Theorem for any probability distribution states that if x has any distribution with mean mu and standard deviation sigma, then x-bar with sample size n will have a distribution that approximates the normal distribution as n gets larger. That means that as a sample size gets larger, the distribution of x-bar will always approach the normal distribution. But how large must the sample size get? Generally speaking, a sample size of 30 or greater will give a reasonable approximation to the normal distribution.

Here's an example using the Central Limit Theorem.

Suppose x has a normal distribution with mean mu = 15 and standard deviation sigma = 4. If you draw random samples of size 5 from the x distribution and x-bar is the sample mean, what can be said about the x-bar distribution?

Even though the sample size is small, much less than 30, you could say the x-bar distribution is approximately normal, since the x distribution is normal. The means mean of x-bar is 15 and standard deviation is 15 divided by square root of 4 which is 7.5.

For another example, suppose the x distribution has mean mu = 100 and standard deviation sigma = 20. But there is no information about the shape of the x distribution. If samples are drawn of size 35 from the x distribution, what can be said about the x-bar distribution? Since the sample size is greater than or equal to 30, the x-bar distribution will be approximately normally distributed with mean = 100 and standard deviation equal to 20 divided by the square root of 35, which is 3.4.

If the sample size were 10 but we did not know the shape of the x distribution, you could not say that x-bar distribution is approximately normal because the sample size is too low.

This guide should help students who are having difficulty understanding the Central Limit Theorem.

Thursday, July 11, 2013

Asymptotes

When dealing with graphing rational functions you have my asymptotes. The asymptotes may be vertical, horizontal or slant.

If the highest exponent of all terms in the numerator is less than the highest exponent of all terms in the denominator, the horizontal asymptote is y = 0

for example f(x) = x/(x^3 + 3)

If the highest exponent of all terms in the numerator is one greater than the highest exponent of all terms in the denominator, there is a vertical asymptote and slant asymptote, which is found by doing long division

for example f(x) = (x^2 + 2x + 3)/(x + 1)

Doing long division you get x +1 with a remainder of 2.. the slant asymptote is y = x + 1, ignore the remainder.

If the highest exponent of all terms in the numerator equals the highest exponent of all terms in the denominator, the horizontal asymptote is the coefficients of the highest terms divided.

for example, f(x) = 2x/(x + 2), the horizontal asymptote is y= 2x/x, y = 2

Veritcal asymptotes, if they exist are found by setting the denominator to zero and solving for x.

Monday, July 8, 2013

Definition of the day

A function is continuous, if and only if an infinitely small increment of the independent variable x produces always an infinitely small increment change of f(x). Therefore, functions cannot have gaps or jumps.

Friday, July 5, 2013

End behavior

The end behavior is behavior of a graph as x tends to -infinity and x tends to +infinity.

Positive leading coefficient : f(x) = x^2, as x tends to -infinity, f(x) tends to infinity
as x tends to +infinity, f(x) tends to infinity

f(x) = x^3, as x tends to -infinity, f(x) tends to -infinity, as x tends to infinity, f(x) tends to infinity

Negative leading coefficient: f(x) = -x^2, as x tends to -infinity, f(x) tends to -infinity.
as x tends to +infinity, f(x) tends to infinity

f(x) = -x^3, as x tends to -infinity, f(x) tends to infinity, as x tends to infinity, f(x) tends to -infinity

Tuesday, July 2, 2013

Domain of f(x)/g(x)

Suppose you have two functions f(x) and g(x). Now you take f(x)/g(x) and want the domain of this new function formed. What you need to do is get the domain of the numerator and the domain of the denominator and find the intersection of the two domains.

For example:

f(x) = x^2 - 4

g(x) = sqrt(x^2 - 9)

f(x)/g(x) = (x^2 - 4)/sqrt(x^2 - 9)

domain of numerator is all real numbers.
domain of denominator is (-inf, -3) U (-3, inf)

The intersection is (-inf, -3) U (-3, inf)