Karin,
The data you have will not allow you to test hypotheses such as:
"Clinicians of type one score more highly than those of type two." While
each clinician has rated 100 images, you only have ratings by three
clinicians of each type, thus you cannot test hypotheses about the
population of these two types of clinicians, you just don't have enough
clinicians.
You should be able to address hypotheses about the relative usefulness
of various types of images. However, keep in mind that 100 is still a
relatively small sample size, and this small sample is further divided
among four types of images. To some extent, what you should do with the
multiple ratings depends on how you want to think about these ratings.
If you believe that each clinicians rating of each image is independent,
that is, that how useful one clinician thinks an image is, is unrelated
to how useful another clinician thinks it is, then you can treat each
physicians rating of each image as a case (n = 600, as opposed to n =
100), and perform a chi-squared test. However, logically, this doesn't
make sense, one would expect a diagnostic image that is useful to one
clinician to also be useful to another clinician.
If the purpose of having multiple clinicians rate each image is to get a
more accurate rating of the "true" utility of each image (akin to the
old adage, measure twice, cut once), then you could combine the scores
in some way, by taking the sum or average of the six ratings (these are
really equivalent). You could then compare the means of these "true"
utility scores, across groups (using a t-test, or ANOVA). This procedure
ignores the fact that the level of "agreement" in clinician ratings may
vary across images, that is, clinicians may tend to agree (give similar
scores) about the utility of some images, and may disagree about others.
You could examine this by looking at the standard deviation of each
rating, or using cronbach's alpha (-alpha- in Stata).
There are other approaches you could take, but in my opinion, the
simplest approach is usually better, especially given that you don't
have all that much data.
Rose
P.S. Since your question was mostly about which analysis is appropriate,
I didn't include a lot of information on how to actually do this in Stata.
K Jensen wrote:
I have data involving assessment of the results of different medical
imaging techniques, by different specialists.
For a hundred images, each has been assessed on a four point quality
scale by six specialists. All the assessors have scored all the images
and there are no missing values. The images fall into four types
(there are different numbers of each type) and there are two types of
clinician (three of each).
So the data looks like:-
Image_ID Type Clinician C_type Score
1 A Clinican_1 Radiol 0
1 A Clinican_2 Radiol 3
1 A Clinican_3 Radiol 1
1 A Clinican_4 Radiog 2
1 A Clinican_5 Radiog 2
1 A Clinican_6 Radiog 1
...
100 D Clinican_6 Radiog 3
We are particularly interested in making inferences about the utility
of the different types of image. One distinction is between the images
scored at 1 or 2 (not useful in practice) v. 3 and 4 (useful).
So a summary could look like:-
image | Score
type | 0 or 1 2 or 3 |
-------+----------------------+
A | Na Ya |
-------+----------------------+
B | Nb Yb |
-------+----------------------+
C | Nc Yc |
-------+----------------------+
D | Nd Yd |
-------+----------------------+--------
Total | N Y | N+Y=600
Or we could consider using the average scores across clinicians for
each image.
Types A and B use different variants of one imaging method, types C
and D another.
We would like to test a priori hypotheses, such as "A and B are better
than C and D" or "C is better than D" or "Clinicians of type one score
more highly than those of type two".
I was tempted to do simple chi square tests based on the rows in the
"tabulate" command, but have realised that that that would in a sense
be overestimating the sample size by a factor of six, as we have 100
different images assessed by six clinicians, not 600 different images.
I thought about logistic regression (xi:logit command) on the "score 0
or 1" v. "1 or 2" outcome, but the results (either as beta
coefficients or odds ratios) would be less easier to interpret than
simple probabilities of falling into different categories.
I also thought about using the glm command and assuming binomial
family data (for the dichotomous outcome).
As you will have guessed I am no statistician. How would a
professional statistician like to see these data analyzed? I have come
to realise as I write that this is a general question rather than a
specifically Stata one, so I am sorry if this is an inappropriate
query for this list.
Thankyou in advance and in hope,
Karin
----------------------------------------------
Mailblocks - A Better Way to Do Email
http://about.mailblocks.com/info
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/