Glossary

Multicollinearity

Multicollinearity is a condition in regression analysis where two or more predictor variables are highly correlated, making it difficult to isolate the individual effect of each. Multicollinearity inflates standard errors and can produce unstable coefficient estimates. The ove...

Definition

Multicollinearity is a condition in regression analysis where two or more predictor variables are highly correlated, making it difficult to isolate the individual effect of each. Multicollinearity inflates standard errors and can produce unstable coefficient estimates. The overall predictive power of the model is unaffected, but inference on individual predictors becomes unreliable. Variance inflation factors (VIFs) are commonly used for detection.

Why It Matters

When predictors overlap substantially, the regression model cannot determine which variable is truly responsible for the outcome. Coefficients may flip signs, become non-significant, or change dramatically with small changes in the sample. This undermines the interpretability of the model and makes it impossible to draw clear conclusions about which predictors matter most. Detecting and addressing multicollinearity is essential for trustworthy regression analysis.

Example

A researcher models house prices using both "house size in square feet" and "number of rooms" as predictors. Because larger houses tend to have more rooms, the correlation between these predictors is 0.92. The regression yields a significant model (R² = 0.75), but neither predictor is individually significant, and the coefficient for rooms is negative — an implausible result. VIFs for both variables exceed 10, confirming severe multicollinearity. The researcher resolves the issue by dropping one predictor or combining them into a single index.

Related Terms

Software Notes

  • SPSS: Analyze > Regression > Linear. Click Statistics and check Collinearity diagnostics. VIF values above 10 (or tolerance below 0.1) indicate problematic multicollinearity. Condition indices above 30 in the collinearity diagnostics table also signal problems.
  • R: car::vif(model) computes variance inflation factors. cor(data.frame(x1, x2, x3)) inspects predictor correlations. perturb::colldiag(model) provides condition indices.
  • Stata: regress y x1 x2 x3 followed by estat vif computes VIFs. collin x1 x2 x3 (user-written; install via ssc install collin) provides condition numbers and variance-decomposition proportions.