tests before running a linear model

Examples below shows R code and functions since I use primarily R for data analysis. You can google for similar function in Python (or other data analysis tool)

  • Independence of observations (aka no autocorrelation)

    • Use the cor() function to test the relationship between your independent variables and make sure they aren’t too highly correlated.
    • cor(heart.data$biking, heart.data$smoking)
    • When we run this code, the output is 0.015. The correlation between biking and smoking is small (0.015 is only a 1.5% correlation), so we can include both parameters in our model.
  • Normality

    • Use the hist() function to test whether your dependent variable follows a normal distribution.
      • e.g. hist(heart.data$heart.disease)
      • Multiple regression histogram
    • The distribution of observations is roughly bell-shaped, so we can proceed with the linear regression.
  • Linearity

    • We can check this using two scatterplots: one for biking and heart disease, and one for smoking and heart disease.
      • e.g. plot(heart.disease ~ biking, data=heart.data)
      • Multiple regression scatter plot 1
      • e.g. plot(heart.disease ~ smoking, data=heart.data)
      • Multiple regression scatter plot 2
    • Although the relationship between smoking and heart disease is a bit less clear, it still appears linear. We can proceed with linear regression.
  • Homoscedasticity

    • We will check this after we make the model.

References

  • [Linear Regression in R An Easy Step-by-Step Guide (scribbr.com)](https://www.scribbr.com/statistics/linear-regression-in-r/)

Metadata

  • topic:: 00 Statistics00 Statistics
    #MOC / Hub for notes related to general statistical knowledge
  • updated:: 2022-10-10 Private or Broken Links
    The page you're looking for is either not available or private!
  • reviewed:: 2022-10-10 Private or Broken Links
    The page you're looking for is either not available or private!
  • #Reference