Glossary

Stratified Sampling

Stratified sampling is a probability sampling method in which the population is divided into homogeneous subgroups, or strata, based on shared characteristics (such as age, gender, or income). A random sample is then drawn independently from each stratum, typically in proporti...

Definition

Stratified sampling is a probability sampling method in which the population is divided into homogeneous subgroups, or strata, based on shared characteristics (such as age, gender, or income). A random sample is then drawn independently from each stratum, typically in proportion to the stratum's size in the population. This ensures representation across key subgroups.

Why It Matters

When a population contains distinct subgroups that differ on the variable of interest, simple random sampling may under-represent smaller groups purely by chance. Stratified sampling guarantees that each subgroup appears in the sample in its correct proportion, reducing sampling error and increasing the precision of estimates. It is widely used in political polling, healthcare surveys, and quality-control audits.

Example

A university wants to survey student satisfaction across three campuses. The campuses contain 30%, 50%, and 20% of the total student body respectively. Rather than drawing 1,000 students at random from the entire population, the researcher stratifies by campus and randomly selects 300, 500, and 200 students from each stratum. The resulting sample mirrors the true campus distribution, ensuring that minority-campus voices are heard with adequate precision.

Related Terms

Software Notes

  • SPSS: Stratified sampling is a design step, not an analysis. For selecting stratified random samples from an existing dataset: sort by stratum, then use Data > Select Cases > Random sample of cases with split files or syntax loops by stratum.
  • R: dplyr::group_by(strata) %>% slice_sample(n = 100) selects 100 cases from each stratum. survey::svydesign(strata = ~strata, ids = ~1, data = df) declares the design for analysis.
  • Stata: bysort strata: sample 100, count draws 100 observations from each stratum. Declare the survey design with svyset [pw=weight], strata(strata) before analysis.