Glossary
Cluster Sampling
Cluster sampling is a probability sampling technique in which the population is divided into clusters — often based on natural groupings such as schools, neighbourhoods, or hospitals — and a random sample of clusters is selected. All individuals within the selected clusters ar...
Definition
Cluster sampling is a probability sampling technique in which the population is divided into clusters — often based on natural groupings such as schools, neighbourhoods, or hospitals — and a random sample of clusters is selected. All individuals within the selected clusters are included in the study, or a further random sample may be drawn from each selected cluster (two-stage sampling).
Why It Matters
When a population is geographically dispersed or lacks a complete sampling frame, individually identifying and contacting every member is impractical or prohibitively expensive. Cluster sampling reduces logistical costs by concentrating data collection within selected clusters. It is the standard method for national household surveys, school-based studies, and healthcare audits in developing regions.
Example
A public-health team wants to assess vaccination coverage across a rural province with 500 villages. Instead of attempting to reach every household, they randomly select 30 villages and survey all households within those villages. This two-stage cluster design dramatically reduces travel time and cost. However, the team must account for intra-cluster correlation in their analysis, as households within the same village tend to have similar vaccination behaviours.
Related Terms
- Random Sampling
- Stratified Sampling
- Multilevel Modelling (used to analyse clustered data)
- Sample Size
Software Notes
- SPSS: Cluster sampling is a design feature, not a built-in selection command. For analysis of cluster-sampled data, use the Complex Samples module: Analyze > Complex Samples > Prepare for Analysis to define clusters and weights.
- R:
survey::svydesign(ids = ~cluster, data = df)declares a cluster-sampled design.survey::svymean(~variable, design)computes means with correct standard errors. Usesurvey::svyglm()for regression. - Stata:
svyset clusterdeclares the primary sampling unit.svyset cluster [pw=weight], strata(strata)for multi-stage designs.svy: mean variableorsvy: regress y xproduce cluster-robust standard errors.