Glossary
Regularisation
A set of techniques that prevent overfitting by adding a penalty term to the model's loss function. Common forms include L1 regularisation (Lasso), which can set coefficients exactly to zero for variable selection, and L2 regularisation (Ridge), which shrinks coefficients towa...
Definition
A set of techniques that prevent overfitting by adding a penalty term to the model's loss function. Common forms include L1 regularisation (Lasso), which can set coefficients exactly to zero for variable selection, and L2 regularisation (Ridge), which shrinks coefficients toward zero. Elastic net combines both penalties. Regularisation is essential in high-dimensional settings where the number of predictors approaches or exceeds the sample size.
Why It Matters
Modern datasets often contain hundreds or thousands of potential predictors. Without regularisation, models overfit the training data and perform poorly on new observations. Regularisation improves prediction accuracy, simplifies models by eliminating irrelevant variables (in the Lasso case), and stabilises coefficient estimates — all critical for reliable inference and forecasting.
Example
A genomics study has 5,000 gene expression variables but only 120 patients. An unregularised logistic regression overfits badly. Applying Lasso regularisation reduces the model to 15 informative genes while maintaining predictive accuracy, yielding a model that is both interpretable and generalisable.
Related Terms
Software Notes
- SPSS: Not built-in for Lasso/Ridge; use the
R pluginwithglmnet - R:
glmnetpackage; e.g.,cv.glmnet(x, y, alpha = 1)for Lasso (alpha = 0for Ridge) - Stata:
lassoandelasticnetcommands (Stata 16+); e.g.,lasso linear y x1-x100, selection(cv)