Glossary

Categorical Variable

A categorical variable takes a limited, fixed set of distinct values representing categories. Examples include gender, blood type, or region. Because categories lack a natural numeric scale, the appropriate measure of central tendency is the mode rather than the mean or median...

Definition

A categorical variable takes a limited, fixed set of distinct values representing categories. Examples include gender, blood type, or region. Because categories lack a natural numeric scale, the appropriate measure of central tendency is the mode rather than the mean or median. Categorical data are typically summarised in contingency tables and analysed with chi-squared tests or logistic regression.

Why It Matters

Categorical variables are among the most common data types in social sciences, medical research, and business analytics. Correctly identifying a variable as categorical determines which statistical methods are appropriate: using means and t-tests on categorical data is meaningless, whereas frequency tables, chi-squared tests, and logistic models are the correct tools. Mishandling categorical variables, such as treating them as numeric without encoding, leads to invalid inferences.

Example

In a customer satisfaction survey, the variable "preferred contact method" has three categories: email, phone, and SMS. You cannot calculate a mean contact method, but you can report that 45% of respondents prefer email (the mode), 30% prefer phone, and 25% prefer SMS. A chi-squared test can then assess whether preference differs by age group.

Related Terms

Software Notes

  • SPSS: Define variable type as "Nominal" or "Ordinal" in Variable View; use Analyze > Descriptive Statistics > Crosstabs for contingency tables
  • R: Convert with as.factor(var); use table(var) for frequency counts and chisq.test() for chi-squared tests
  • Stata: Use encode to convert string categories to numeric; tab var1 var2 for cross-tabulation, tab var1 var2, chi2 for chi-squared test