I am trying to model a 2-level problem using -xtmixed-, but cannot find
the right way to do this from the manual.
Here's the problem: I'm studying hospital resource use for patients
cared for by a group of doctors (call the resource/dependent variable
COST). I have a number of characteristics of the patients that relate
to COST (e.g. age, severity of illness, etc). --->
What I REALLY want to know is the magnitude of the differences among the
physicians. So, for example, an ANOVA (followed by -predict- or
-adjust-) that includes the patient-level indep variables and a
categorical variable representing the different doctors (call it DOCNUM)
shows me (by looking at the coefficient and significance for DOCNUM)
that there are substantial differences among the doctors in COST. OK so
far --->
But now I want to look deeper and evaluate the influence of
characteristics of the doctors themselves (e.g. their years in practice,
board certification status, etc) on COST, i.e. having the information
from the ANOVA I now want to see how much of the differences between the
doctors in COST is "mediated by" these characteristics of the doctors
themselves. Clearly just putting these doctor-level variables into the
ANOVA as if they were patient-level variables is incorrect (as discussed
in many places, e.g. Snijkers & Bosker's book "Multilevel Analysis").
So, I'm trying to figure out how to code the syntax of -xtmixed- to do
this task. Unfortunately, in the Stata 9 XT manual there are no
examples given under -xtmixed- for how to correctly include variables
which are only relevant at the higher level --- by contrast in HLM I
believe you do this explicitly. I do not believe (but am not certain of
this) it is correct to just put these doctor-level variables into the
fixed-effect equation such as: xtmixed COST patient_age severity
doc1-docN MD_age || DOCNUM: or even
xtmixed COST patient_age severity
doc1-docN MD_age || DOCNUM: MD_age
Part of the reason I think this must be wrong is that in my dataset
each of the doctors has a different age, and thus the N dummy variables
(doc1 to docN) created for the N+1 doctors turns out to be degenerate
with the MD_age variable and the program automatically eliminates one of
them. Since my goal is to see what proportion of the effect of the
physicians can be explained by THEIR characteristics, it seems I must
find a way to keep the physician identifier variables (either doc1-docN
or DOCNUM) in the fixed effects part of this model --- or do I????
I also don't think it's correct to leave the doctor identifier out of
the fixed effect model (e.g. xtmixed COST patient_age severity ||
DOCNUM: MD_age) because then I don't get coefficients that tell me about
the magnitude of the effects of the physicians as a group, or of their
age on COST.
So, if anyone out there can help me with this syntax problem, I'd really
appreciate it. Also, if you also can tell me how, from the output, I
can tell the proportion of the variation in COST attributable to the
doctors can, in fact, be explained by their characteristics (as included
in the model, e.g. MD_age), I'd appreciate that too.