Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Interrater agreement: finding the problematic items |
Date | Fri, 14 Jun 2013 18:53:17 +0100 |
Many people seem unaware of the simplicity and generality of various measures of inequality, diversity and concentration. (There are many other names.) They may be under the impression that they are rather odd and ad hoc measures used by people in rather odd and ad hoc fields such as economics, sociology or ecology. Here are a few examples of two such measures done calculator-style. All we are assuming is a set of categories, not even ordered, not even numbered, just labelled. (There are many, many others, but I like these two measures.) For a change, . sysuse auto, clear (1978 Automobile Data) . tab rep78, matcell(f_rep) Repair | Record 1978 | Freq. Percent Cum. ------------+----------------------------------- 1 | 2 2.90 2.90 2 | 8 11.59 14.49 3 | 30 43.48 57.97 4 | 18 26.09 84.06 5 | 11 15.94 100.00 ------------+----------------------------------- Total | 69 100.00 . tab foreign, matcell(f_for) Car type | Freq. Percent Cum. ------------+----------------------------------- Domestic | 52 70.27 70.27 Foreign | 22 29.73 100.00 ------------+----------------------------------- Total | 74 100.00 The stages are 1. Copy the vectors of frequencies into vectors in Mata. 2. Scale to vectors of probabilities. 3. The sum of squared probabilities is a measure of agreement. Everyone agrees => every one is in one category. One probability is 1 and the others are 0, so sum is 1. Lower limit is 0 (not reached in practice.) This measure, or a relative of it, is variously named for, or attributed to Gini, Turing, Hirschman, Simpson, Herfindahl, Good and no doubt others. 4. The reciprocal of this has a nice interpretation as "the equivalent number of equally common categories". 5. The weighted mean of the log reciprocal probabilities is often known as the entropy. If is often named for Shannon (occasionally for Weaver as well) and/or Wiener. (Weaver and Wiener were precisely two distinct people, but under conditions of lax spelling standards some students have known to attempt to merge them retrospectively.) 6. Exponentiating that gives a number with a nice interpretation as "the equivalent number of equally known categories" (another estimate thereof). . mata ------------------------------------------------- mata (type end to exit) ----------- : f1 = st_matrix("f_rep") : f1 1 +------+ 1 | 2 | 2 | 8 | 3 | 30 | 4 | 18 | 5 | 11 | +------+ : p1 = f1 :/ sum(f1) : p1 1 +---------------+ 1 | .0289855072 | 2 | .115942029 | 3 | .4347826087 | 4 | .2608695652 | 5 | .1594202899 | +---------------+ : p1:^2 1 +---------------+ 1 | .0008401596 | 2 | .0134425541 | 3 | .1890359168 | 4 | .0680529301 | 5 | .0254148288 | +---------------+ : sum(p1:^2) .2967863894 : 1/sum(p1:^2) 3.369426752 : sum(p1 :* ln(1:/p1)) 1.357855957 : exp(sum(p1 :* ln(1:/p1))) 3.887848644 : : f2 = st_matrix("f_rep") : f2 1 +------+ 1 | 2 | 2 | 8 | 3 | 30 | 4 | 18 | 5 | 11 | +------+ : p2 = f2 :/ sum(f2) : p2 1 +---------------+ 1 | .0289855072 | 2 | .115942029 | 3 | .4347826087 | 4 | .2608695652 | 5 | .1594202899 | +---------------+ : p2:^2 1 +---------------+ 1 | .0008401596 | 2 | .0134425541 | 3 | .1890359168 | 4 | .0680529301 | 5 | .0254148288 | +---------------+ : sum(p2:^2) .2967863894 : 1/sum(p2:^2) 3.369426752 : sum(p2 :* ln(1:/p2)) 1.357855957 : exp(sum(p2 :* ln(1:/p2))) 3.887848644 : : end ------------------------------------------------------------------------------------- Nick njcoxstata@gmail.com On 14 June 2013 16:34, Nick Cox <njcoxstata@gmail.com> wrote: > For "Chronbach" read "Cronbach". > > Some Statalist members are well versed in psychometrics but I see no > reason why more general statistical ideas should not relevant too. The > standard deviation of ratings for each item would be one measure of > disagreement. Perhaps better ones would be the sum of squared > probabilities or the entropy of the probability distribution for the > rating. > Nick > njcoxstata@gmail.com > > On 14 June 2013 16:11, Ilian, Henry (ACS) <Henry.Ilian@dfa.state.ny.us> wrote: >> I'm doing an interrater agreement study on a case-reading instrument. There are five reviewers using an instrument with 120 items. The ratings scales are ordinal with either two, three or four options. I'm less interested in reviewer tendencies than I am in problematic items, those with high levels of disagreement. >> >> Most of the interrater agreement/interrater reliability statistics look at reviewer tendencies. I can see two ways of getting at agreement on items. The first is to sum all the differences between all possible pairs of reviewers, and those with the highest totals are the ones to examine. The other is Chronbach's alpha. Is there any strong argument for or against either approach, and is there a different approach that would be better than these? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/