Inter-Rater Reliability — AnalyticsScholar

Definition

Inter-rater reliability is the degree of agreement among two or more independent raters or observers who are classifying or scoring the same set of items, behaviours, or responses. It quantifies how much of the observed variation is due to the objects being rated rather than differences among raters. Common measures include Cohen's kappa (two raters, categorical data) and the intraclass correlation coefficient (multiple raters, continuous or ordinal data).

Why It Matters

Subjective judgment is unavoidable in many research domains, from coding interview transcripts to diagnosing medical images or rating essays. Without establishing that different raters agree sufficiently, findings may reflect rater bias rather than genuine patterns in the data. High inter-rater reliability gives readers confidence that the coding scheme is clear, the raters are well-trained, and the results are replicable by other researchers.

Example

Two researchers independently code 100 open-ended survey responses into five thematic categories. Their raw agreement is 85%, but because some categories are very common, chance agreement is high. Cohen's kappa adjusts for chance and yields 0.72, indicating substantial agreement. Before proceeding to analysis, they resolve the 15 disagreements through discussion, improving both the coding scheme and the reliability of the final dataset.

Related Terms

Software Notes

SPSS: Analyze > Descriptive Statistics > Crosstabs provides raw agreement, but not kappa. For Cohen's kappa, use the Weighted Kappa procedure under Analyze > Scale (if available) or calculate via syntax. For ICC: Analyze > Scale > Intraclass Correlation Coefficient.
R: irr::kappa2(cbind(rater1, rater2)) for Cohen's kappa. irr::icc(data, model = "twoway", type = "agreement") for intraclass correlation. psych::cohen.kappa() for multiple raters.
Stata: kap rater1 rater2 for Cohen's kappa. kapgw rater1 rater2 for weighted kappa. icc rating target rater for intraclass correlation. kapprev reverses the rating order for negative agreement.