Tuesday, July 28, 2015

Say you want to get a confidence interval for the difference of two proportions.

To get p^ you take x/n, so the first two are quite simple
p1^ = 25.176 = .142
 p2^ = 32/143 = .224

 for 90% CI you use 1.645 for zcritical
formula is p1^ - p2^ +/- 1.645(sqrt(p1^q1^/n1 + p2^q2^/n2))
if you do this correctly for the data we have you get
-.082 +/- .071850965
(-.154, -.010)

Thursday, July 23, 2015

A type I error is the probability of rejecting the null hypothesis when it is true, which is the alpha level of the test.

A type II error is the probability of not rejecting the null hypothesis when we should reject it, which is Beta.

By intuition, the greater the departure from Ho, the less likely that departure will be detected, therefore there is a less chance of rejecting the null hypothesis, which decreases the chance of Type II error, which means Beta is smaller

Sunday, July 19, 2015

If you look at the regression equation for TVHOURS = 1.86 + 0.02 (AGE), the slope is .02 which is practically a straight horizontal line. Although a test might show that this slope is statistically significant, .02 is so small that it adds very little to the overall result, having no practical significance.

For example. The difference between hours for a person 20 years old and 30 years old, using this equation is 1.86+ 20(.02) = 2.26 hours and 1.86 + 30(.02) = 2.46 hours That comes out to just a 12 minute difference. Even for a wide difference in age from 20 to 70 it only adds 1 extra hour.. for a 50 year difference. Each increase of 1 year in age adds only 1.2 extra minutes of tv time, showing no practical significance.

Wednesday, July 15, 2015

Example of a problem using a contigency table

American Continental Delta United Total
Yes 48 69 68 25 210
No 52 41 62 35 190
Total 100 110 130 60 400
Now what we have to do is get the expected number in each spot. For instance we need the expected number from American and YES. We get the expected number by taking the row total times the column total divided by the total number in all.
For American and YES that would be 210(100)/400 (row total of 210)(column total of 100)/(total of 400), which equals 52.5
You have to do that for ALL of the 8 different spots on the table. I'll show you one more .. For Continental and YES it's (210)(110)/400 = 57.75
If you do that correctly this is what you will have and i'll show on another table
American YES NO
Observed = 48 52
Expected = 52.5 47.5
Continental YES NO
Observed = 69 41
Expected = 57.5 52.25
Delta YES NO
Observed 68 62
Expected 60.25 61.75
United YES NO
Observed 25 35
Expected 31.5 28.5

Now to get the test statistic you have to take the (observed - expected)^2 . Divided that by the expected and sum all the values.
For American and YES that is (48-52.5)^2/52.5
For Continental and YES that is (69 - 57.75)^2/57.75
and so on....
If you do that for all eight values you should get 8.251 (you could have a slightly different number depending on how you round)

Now you need the critical value for the test. To get the df you take (rows - 1)(columns - 1)
We have 2 rows in the table and 4 columns, so df = (4-1)(2-1) = 3
Look in any chart for Chi-square distribution for 3 df and alpha = .05 and you get 9.488
We compare our test statistic to 9.488
Remember we reject Ho and conclude there is a preference to which airline we choose if the test statistic is greater than the critical value, but 8.251 < 9.488 so we do what? We do not reject Ho, so we have no preference as to what airline to choose

Friday, July 10, 2015

For linear regression line, you need y^ = a + bx, where b = r(sy/sx) and a = y-bar - b(x-bar). you can get sy, sx, x-bar and y-bar by putting the values into a calculator. R can be gotten using Excel or by hand with a long, tedious formula

For multiple linear regression, it's best to use Excel or use this online calculator for multiple regression, simply put in the values for the x variables and the y variable and you will the result you need


Remember that a multiple linear regression equation will take on the form y^ = a + b0x1 + b1x2 and so on.

Friday, July 3, 2015

Resampling is simply mimicking the process of sampling
by choosing another sample at random from the population based on data from your sample.we sample instead from an artificial population constructed on our computer and that embodies everything we know
about the population of interest. In many, but not all, of the examples
that follow, this artificial population is the very data set from which
we seek to draw inferences. Since the data set is itself a sample of the
whole population, we are taking a sample from the sample: resampling.
This doesn’t, of course, provide more information about the population,
but it does provide us with a way of understanding the consequences of
sampling variability for drawing inferences about the population based
on our data.