Glossary
Inter-Rater Reliability
Inter-rater reliability is the degree of agreement among two or more independent raters or observers who are classifying or scoring the same set of items, behaviours, or responses. It quantifies how much of the observed variation is due to the objects being rated rather than d...
Definition
Inter-rater reliability is the degree of agreement among two or more independent raters or observers who are classifying or scoring the same set of items, behaviours, or responses. It quantifies how much of the observed variation is due to the objects being rated rather than differences among raters. Common measures include Cohen's kappa (two raters, categorical data) and the intraclass correlation coefficient (multiple raters, continuous or ordinal data).
Why It Matters
Subjective judgment is unavoidable in many research domains, from coding interview transcripts to diagnosing medical images or rating essays. Without establishing that different raters agree sufficiently, findings may reflect rater bias rather than genuine patterns in the data. High inter-rater reliability gives readers confidence that the coding scheme is clear, the raters are well-trained, and the results are replicable by other researchers.
Example
Two researchers independently code 100 open-ended survey responses into five thematic categories. Their raw agreement is 85%, but because some categories are very common, chance agreement is high. Cohen's kappa adjusts for chance and yields 0.72, indicating substantial agreement. Before proceeding to analysis, they resolve the 15 disagreements through discussion, improving both the coding scheme and the reliability of the final dataset.
Related Terms
Software Notes
- SPSS: Analyze > Descriptive Statistics > Crosstabs provides raw agreement, but not kappa. For Cohen's kappa, use the Weighted Kappa procedure under Analyze > Scale (if available) or calculate via syntax. For ICC: Analyze > Scale > Intraclass Correlation Coefficient.
- R:
irr::kappa2(cbind(rater1, rater2))for Cohen's kappa.irr::icc(data, model = "twoway", type = "agreement")for intraclass correlation.psych::cohen.kappa()for multiple raters. - Stata:
kap rater1 rater2for Cohen's kappa.kapgw rater1 rater2for weighted kappa.icc rating target raterfor intraclass correlation.kapprevreverses the rating order for negative agreement.