Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: listing groups that differ from predicted results in logit
From
David Souther <[email protected]>
To
[email protected]
Subject
Re: st: listing groups that differ from predicted results in logit
Date
Thu, 25 Feb 2010 09:22:19 -0600
> My feeling is that the answer to your question is a relatively simple one,
> but it is hard to tell based on the information you give. If we assume that
> your dv is something like a 0-1 variable for mortality, you could calculate
> the proportion of individuals who had died, and then identify those
> hospitals that differ substantially from the mean.
>
The dv is whether the hospital employee performed a certain
administrative action or not. The idea is that it is predicted by
individual level characteristics (this action is not mandated by
rules, it is optional) and certain hospital level characteristics.
The problem I see with comparing the mean number of individuals who
exhibited the DV versus the mean across hospitals is that doesn't
control for the other individual and hospital level characteristics.
On Wed, Feb 24, 2010 at 9:58 PM, Daniel Miller <[email protected]> wrote:
> Hi David --- Michael is correct that with multi-level data you could
> estimate a random intercepts model. However, there are some more fundamental
> problems with your modeling. The reason your variables were dropped from the
> model is because they do not vary by hospital. So, when you included dummy
> variables for all the hospitals, your hospital var1 and hospital var2
> dropped out (or the dummies did). Similarly, your person dummies were
> perfectly colinear with the individual var1 and individual var 2. This
> intuition should be relatively straightforward.
>
> My feeling is that the answer to your question is a relatively simple one,
> but it is hard to tell based on the information you give. If we assume that
> your dv is something like a 0-1 variable for mortality, you could calculate
> the proportion of individuals who had died, and then identify those
> hospitals that differ substantially from the mean.
>
> Dan
>
>
>
> On Wed, Feb 24, 2010 at 10:44 PM, Michael Norman Mitchell <
> [email protected]> wrote:
>
>> Dear David
>>
>> I wonder if you might start by running this as a random intercept model.
>> You could then look at the level two residuals to get a sense of the nature
>> of the distribution of performance, after adjusting for the level 1
>> predictors. This could also give you a sense of whether there are outliers.
>> However, I am not sure how you could translate this strategy into an actual
>> statistical test.
>>
>> Here is some mock code using the "union" data file from Stata...
>>
>> * use the data
>> use http://www.stata-press.com/data/r11/union.dta, clear
>>
>> * idcode is like your hospital id
>> xtset idcode
>>
>> * union is the outcome, age and south are level 1 predictors
>> xtreg union age south
>>
>> * generate the level 2 residual, naming it r2
>> predict r2, u
>>
>> * examine the residuals, for example using a histogram
>> hist r2
>>
>> I know this is not a final solution, but I hope it is a useful starting
>> place.
>>
>> Michael N. Mitchell
>> See the Stata tidbit of the week at...
>> http://www.MichaelNormanMitchell.com
>> Visit me on Facebook at...
>> http://www.facebook.com/MichaelNormanMitchell
>>
>>
>> On 2010-02-24 7.25 PM, David Souther wrote:
>>
>>>
>>>>
>>> Hello Statalist:
>>>
>>> I've got a dataset of individuals (id) in hospitals (hospitals) with
>>> some individual level data (indiv_var1 and indiv_var2) as well as
>>> hospital level data (hosp_var1 hosp_var2) similar to the data example
>>> below. I'd like to use the indiv* and hosp* IVs to predict the binary
>>> DV (dv). In the real dataset there are thousands of hospitals, and
>>> hundreds of individuals per hospital.
>>>
>>> What I am hoping to discover is those hospitals that have
>>> significantly higher or significantly lower than expected
>>> probabilities of the DV. So, I would like to somehow list those
>>> hospitals that are the highest/lowest. I tried running a logit with
>>> all these as IVs plus dummies for all the hospitals so that I could
>>> use predict to find the difference between the predicted and the
>>> actual values, but it drops all the dummy variables -->
>>> logit dv indiv* hosp_var1 hosp_var2 hosp_dummies*
>>> Also, I tried clogit but it said there was no variation in the
>>> groups. As an alternative, could I just run regression and get the
>>> components that go into the rvfplot (residual versus the fitted
>>> points; if that makes any sense)?
>>>
>>> Any other ideas on how to get the hospitals that are highest/lowest ??
>>> Thanks.
>>>
>>> **data**
>>> input hosp id dv indiv_var1 indiv_var2 hosp_var1
>>> hosp_var2
>>> 1 1 1 3 34 88 9
>>> 1 2 1 7 24 88 9
>>> 1 3 0 6 12 88 9
>>> 1 4 0 6 12 88 9
>>> 1 5 0 9 12 88 9
>>> 1 6 0 9 13 88 9
>>> 2 1 0 4 66 77 8
>>> 2 2 0 . 67 77 8
>>> 2 3 1 9 68 77 8
>>> 2 4 0 3 67 77 8
>>> 2 5 1 2 6 77 8
>>> 2 6 0 9 56 77 8
>>> 3 1 0 1 34 11 1
>>> 3 2 0 1 3 11 1
>>> 3 3 1 2 2 11 1
>>> 3 4 0 4 1 11 1
>>> 3 5 0 1 2 11 1
>>> 3 6 0 . 1 11 1
>>> end
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/statalist/faq
>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/