Glossary

R: Two-stage least squares (2SLS) for endogeneity

A condition in which an explanatory variable in a regression model is correlated with the error term, leading to biased and inconsistent estimates. Common causes include omitted-variable bias, measurement error, and simultaneous causality.

title: Endogeneity

slug: endogeneity

Definition

A condition in which an explanatory variable in a regression model is correlated with the error term, leading to biased and inconsistent estimates. Common causes include omitted-variable bias, measurement error, and simultaneous causality.

Why It Matters

Endogeneity strikes at the core of causal inference. When a regressor is endogenous, ordinary least squares estimates do not converge to the true parameter even with infinite data. This means policy conclusions drawn from endogenous regressions can be entirely misleading — suggesting an effect where none exists, or masking a genuine effect. Identifying and addressing endogeneity through instrumental variables, natural experiments, or structural modeling is often the central challenge in applied econometric work.

Example

A researcher regresses wages on education to estimate the return to schooling. However, unobserved ability affects both education and wages, creating a correlation between education and the error term. Because ability is omitted, the OLS coefficient on education is upward biased. An instrumental variable — such as proximity to a college — can restore consistency by providing variation in education that is uncorrelated with ability.

```r

library(AER)

iv_model <- ivreg(wage ~ education | college_proximity,

data = df)

summary(iv_model)

```

Related Terms

Software Notes

R: Use AER::ivreg for instrumental-variable estimation; plm for panel data with fixed effects. Stata: Use ivregress 2sls for 2SLS; ivreg2 for robust and cluster-robust inference with diagnostics. Python: Use linearmodels.iv.IV2SLS for instrumental-variable estimation.

Contact Us for Support → /contact/