Linear regression is one of the many courses I studied while
earning my BS in Statistics from Lehigh University. I have been tutoring
statistics for the past 13 years and there is often confusion among
students as to when the linear regression model can be used. Sometimes
the model, although tempting to use, doesn't apply. Certain conditions
and assumptions must be checked. The following paragraphs I explain each
condition and assumption as I would to any of my students over the
years.

The linear regression model has two easily estimated parameters, a significant measure of how well the model fits the data, and it has the ability to predict new values. The first condition is the

A linear regression model makes many assumptions. First, the relationship between the variables must be linear. This assumption cannot be checked, per say, but can be checked by viewing the scatter plot. The scatter plot will help check the

To summarize the scatter of the data in the plot by using the standard deviation, all of the residuals should have the same spread, or variance. Therefore, we need the equal variance assumption. Check for changing spread of the scatter plot. If you notice that the spread thickens at any part of the plot, then the

Finally, we have to check for outliers, which are points which are drastically far above or below the rest of the data points. These points can dramatically change a regression model, such as the slope, which can mislead us about the relationship between the variables in the model. Therefore, be certain that the

Although the linear regression model is widely used and a very powerful tool in statistics to predict values, several assumptions and conditions must be check to make sure the model is appropriate. If the model is inappropriate, do not use it. The results will be misleading.

The linear regression model has two easily estimated parameters, a significant measure of how well the model fits the data, and it has the ability to predict new values. The first condition is the

**quantitative variables condition**. When a measured variable with units answers questions about the amount or quantity of what is being measured, it is a quantitative variable. Examples of quantitative variables are cost, scores, temperature, height, and weight.A linear regression model makes many assumptions. First, the relationship between the variables must be linear. This assumption cannot be checked, per say, but can be checked by viewing the scatter plot. The scatter plot will help check the

**straight enough condition**, which means the data in the scatter plot should be straight enough to make sense. For example, if the data shows more of a curved relationship between the variables and you try to use a linear model, stop. You cannot use it, the model won't mean a thing.To summarize the scatter of the data in the plot by using the standard deviation, all of the residuals should have the same spread, or variance. Therefore, we need the equal variance assumption. Check for changing spread of the scatter plot. If you notice that the spread thickens at any part of the plot, then the

**does the plot thicken? condition**does not hold and the linear model does not apply.Finally, we have to check for outliers, which are points which are drastically far above or below the rest of the data points. These points can dramatically change a regression model, such as the slope, which can mislead us about the relationship between the variables in the model. Therefore, be certain that the

**outlier condition**is also met.Although the linear regression model is widely used and a very powerful tool in statistics to predict values, several assumptions and conditions must be check to make sure the model is appropriate. If the model is inappropriate, do not use it. The results will be misleading.

## No comments:

## Post a Comment