Intraclass correlation
How: The data is in the form of a matrix.
The number of columns represents the number of raters or measurers, and the
number of rows the number of cases or subjects under study. Each
cell of the matrix is the value that particular measurer (column) gives to
that case (row). Place the number of columns and rows in the
first 2 text boxes, and the data matrix in the text area. Click the ICC
button, and the results will show.
Note that the columns must be separated by at least 1 space.
The data can be entered directly, but the whole matrix as obtained from
another text editor or Excel can also be copied and pasted directly into the
text area.
Reference: Portney L.G., Watkins M.P. Foundations of Clinical Research. Applications and Practice (1993) Appleton & Lange, Norwalk, Conneticut ISBN 0838510655 p. 509516
Explanation: Intraclass correlation
evaluates the level of agreement between raters in measurements, where the measurements are
parametric or at least interval. This method is better than ordinary correlation as
more than 2 raters can be included, and there is a correction for
correlations between raters that becomes apparent when the range of
measurement is large. The coefficient represents concordance,
where 1 is perfect agreement and 0 is no agreement at all. In the
analysis of variance, F value for between raters test whether the raters
significantly differ in their assessment.
Three models are available.
Model 1 assumes that the raters
rating different subjects are different, being subsets of a larger set of
raters, randomly chosen.
Model 2 assumes the same raters rate all
cases, and the raters are a subset of a larger set of raters.
Model 3 makes no assumptions about the raters.
Model 1 is rarely used, and model 3 cannot be generalized. Model 2 is therefore the
usual one used.
Each model has two versions of the intraclass correlation coefficient:
Single measure reliability: individual ratings constitute the unit of analysis. That is, single measure reliability gives The the reliability for a single judge's rating.
Average measure reliability: the mean of all ratings is the unit of analysis. That is, average measure reliability gives the reliability of the average rating assuming that there is a reasonable number of raters.
If each rating score is from an individual rating, then the single or
individual form is used. However if the rating is the mean of
multiple ratings (for example, each rater is a team and the rating is the
average of the team), then the mean form should be used.
In other words, Model 2 individual form is the usual one to be used.
Example: Let say we wish to measure the
length of the fetus inutero, using either xray (col 1), ultrasound (col 2),
or magnetic resonance (col 3). We want to know how much they all
agree with each other. We measure 4 babies and obtain the following
results. the 4 rows are 1.1, 1.2, 1.5; 2.2, 2.1, 2.0;
6.3, 6.1, 6.8; 9.4, 9.5, 9.0. ICC Coefficient=0.9956,
analysis of variance F=0.13. There is no significant difference
between raters, and the level of concordance is high.
