obg.cuhk.edu.hk 4 August 2015
Intranet Login

Intraclass correlation

How: The data is in the form of a matrix.  The number of columns represents the number of raters or measurers, and the number of rows the number of cases or subjects under study.   Each cell of the matrix is the value that particular measurer (column) gives to that case (row).   Place the number of columns and rows in the first 2 text boxes, and the data matrix in the text area.  Click the ICC button, and the results will show.

Note that the columns must be separated by at least 1 space.   The data can be entered directly, but the whole matrix as obtained from another text editor or Excel can also be copied and pasted directly into the text area.

Number of raters (cols)
Number of cases (rows)

Each column represents the measurements of a rater and each row the cases measured.

The numbers must be separated by a white space (blank or tab) and all element of the matrix must be filled, otherwise the calculation will be in error.

If the row count differs from the stated number of rows then the lower number will be used.

Numbers can be typed in, but a convenient way is to use excel and paste the data into the text box.

Reference: Portney L.G., Watkins M.P. Foundations of Clinical Research. Applications and Practice (1993) Appleton & Lange, Norwalk, Conneticut ISBN 0-8385-1065-5 p. 509-516

Explanation: Intraclass correlation evaluates the level of agreement between raters in measurements, where the measurements are parametric or at least interval.  This method is better than ordinary correlation as more than 2 raters can be included, and there is a correction for correlations between raters that becomes apparent when the range of measurement is large.   The coefficient represents concordance, where 1 is perfect agreement and 0 is no agreement at all.  In the analysis of variance, F value for between raters test whether the raters significantly differ in their assessment.

Three models are available.  
Model 1 assumes that the raters rating different subjects are different, being subsets of a larger set of raters, randomly chosen.  
Model 2 assumes the same raters rate all cases, and the raters are a subset of a larger set of raters.  
Model 3 makes no assumptions about the raters.  

Model 1 is rarely used, and model 3 cannot be generalized.   Model 2 is therefore the usual one used.

Each model has two versions of the intraclass correlation coefficient:

Single measure reliability: individual ratings constitute the unit of analysis. That is, single measure reliability gives The the reliability for a single judge's rating.

Average measure reliability: the mean of all ratings is the unit of analysis. That is, average measure reliability gives the reliability of the average rating assuming that there is a reasonable number of raters.

If each rating score is from an individual rating, then the single or individual form is used.   However if the rating is the mean of multiple ratings (for example, each rater is a team and the rating is the average of the team), then the mean form should be used.

In other words, Model 2 individual form is the usual one to be used.

Example: Let say we wish to measure the length of the fetus in-utero, using either x-ray (col 1), ultrasound (col 2), or magnetic resonance (col 3).   We want to know how much they all agree with each other. We measure 4 babies and obtain the following results.   the 4 rows are 1.1, 1.2, 1.5;  2.2, 2.1, 2.0;  6.3, 6.1, 6.8;  9.4, 9.5, 9.0.   ICC Coefficient=0.9956, analysis of variance F=0.13.   There is no significant difference between raters, and the level of concordance is high.

Content disclaimer  |  Privacy statement  |  Copyright @2014. All Rights Reserved. O&G, CUHK.