Glossary

Regularisation

Definition

A set of techniques that prevent overfitting by adding a penalty term to the model's loss function. Common forms include L1 regularisation (Lasso), which can set coefficients exactly to zero for variable selection, and L2 regularisation (Ridge), which shrinks coefficients toward zero. Elastic net combines both penalties. Regularisation is essential in high-dimensional settings where the number of predictors approaches or exceeds the sample size.

Why It Matters

Modern datasets often contain hundreds or thousands of potential predictors. Without regularisation, models overfit the training data and perform poorly on new observations. Regularisation improves prediction accuracy, simplifies models by eliminating irrelevant variables (in the Lasso case), and stabilises coefficient estimates — all critical for reliable inference and forecasting.

Example

A genomics study has 5,000 gene expression variables but only 120 patients. An unregularised logistic regression overfits badly. Applying Lasso regularisation reduces the model to 15 informative genes while maintaining predictive accuracy, yielding a model that is both interpretable and generalisable.

Related Terms

Software Notes

SPSS: Not built-in for Lasso/Ridge; use the R plugin with glmnet
R: glmnet package; e.g., cv.glmnet(x, y, alpha = 1) for Lasso (alpha = 0 for Ridge)
Stata: lasso and elasticnet commands (Stata 16+); e.g., lasso linear y x1-x100, selection(cv)