This question is statistical and not strictly a Stata software
question., although Stata will be used for the analyses...
I would like to know the most appropriate (basic/simple) tests for
inter-rater and intra-rater reliability (one month later) given the
following scenario:
3 raters are grading a total of 9 bones. Three bones having been
drilled by three surgical residents. The grading instrument includes 15
core areas with a total of 24 individual items that receive a dichotmous
score, for a total of 36 potential points. For each of the 24 individual
items scored, either 1,2, or three points are given or zero points are
given. Thus, some items provide a larger proportion of the total score.
Nonetheless, the raters only get a dichotmous choice for each item. The
raters are grading to see whether certain aspects of drilling the bone
have been accomplished (eg. No Holes or Holes where No Holes=1 point;
Holes=0 points). There is nothing qualitative in the assessment
Is the Kappa statistic the simplest and most appropriate reliability
assessement (for both intra and inter)? Is there a more appropriate test
that Stata can do? Also, can anyone comment on the sample size (this is
unfortunately, the largest sample size available). Any further
comment, advice or reference to a website for further details
etc...would be very welcome.