Glossary
Categorical Variable
A categorical variable takes a limited, fixed set of distinct values representing categories. Examples include gender, blood type, or region. Because categories lack a natural numeric scale, the appropriate measure of central tendency is the mode rather than the mean or median...
Definition
A categorical variable takes a limited, fixed set of distinct values representing categories. Examples include gender, blood type, or region. Because categories lack a natural numeric scale, the appropriate measure of central tendency is the mode rather than the mean or median. Categorical data are typically summarised in contingency tables and analysed with chi-squared tests or logistic regression.
Why It Matters
Categorical variables are among the most common data types in social sciences, medical research, and business analytics. Correctly identifying a variable as categorical determines which statistical methods are appropriate: using means and t-tests on categorical data is meaningless, whereas frequency tables, chi-squared tests, and logistic models are the correct tools. Mishandling categorical variables, such as treating them as numeric without encoding, leads to invalid inferences.
Example
In a customer satisfaction survey, the variable "preferred contact method" has three categories: email, phone, and SMS. You cannot calculate a mean contact method, but you can report that 45% of respondents prefer email (the mode), 30% prefer phone, and 25% prefer SMS. A chi-squared test can then assess whether preference differs by age group.
Related Terms
Software Notes
- SPSS: Define variable type as "Nominal" or "Ordinal" in Variable View; use Analyze > Descriptive Statistics > Crosstabs for contingency tables
- R: Convert with
as.factor(var); usetable(var)for frequency counts andchisq.test()for chi-squared tests - Stata: Use
encodeto convert string categories to numeric;tab var1 var2for cross-tabulation,tab var1 var2, chi2for chi-squared test