Assumptions and Conditions of Linear Regression

Linear regression is one of the many courses I studied while earning my BS in Statistics from Lehigh University. I have been tutoring statistics for the past 13 years and there is often confusion among students as to when the linear regression model can be used. Sometimes the model, although tempting to use, doesn't apply. Certain conditions and assumptions must be checked. The following paragraphs I explain each condition and assumption as I would to any of my students over the years.

The linear regression model has two easily estimated parameters, a significant measure of how well the model fits the data, and it has the ability to predict new values. The first condition is the quantitative variables condition. When a measured variable with units answers questions about the amount or quantity of what is being measured, it is a quantitative variable. Examples of quantitative variables are cost, scores, temperature, height, and weight.

A linear regression model makes many assumptions. First, the relationship between the variables must be linear. This assumption cannot be checked, per say, but can be checked by viewing the scatter plot. The scatter plot will help check the straight enough condition, which means the data in the scatter plot should be straight enough to make sense. For example, if the data shows more of a curved relationship between the variables and you try to use a linear model, stop. You cannot use it, the model won't mean a thing.

To summarize the scatter of the data in the plot by using the standard deviation, all of the residuals should have the same spread, or variance. Therefore, we need the equal variance assumption. Check for changing spread of the scatter plot. If you notice that the spread thickens at any part of the plot, then the does the plot thicken? condition does not hold and the linear model does not apply.

Finally, we have to check for outliers, which are points which are drastically far above or below the rest of the data points. These points can dramatically change a regression model, such as the slope, which can mislead us about the relationship between the variables in the model. Therefore, be certain that the outlier condition is also met.

Although the linear regression model is widely used and a very powerful tool in statistics to predict values, several assumptions and conditions must be check to make sure the model is appropriate. If the model is inappropriate, do not use it. The results will be misleading.