Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: listing groups that differ from predicted results in logit
From
Michael Norman Mitchell <[email protected]>
To
[email protected]
Subject
Re: st: listing groups that differ from predicted results in logit
Date
Wed, 24 Feb 2010 19:44:29 -0800
Dear David
I wonder if you might start by running this as a random intercept
model. You could then look at the level two residuals to get a sense of
the nature of the distribution of performance, after adjusting for the
level 1 predictors. This could also give you a sense of whether there
are outliers. However, I am not sure how you could translate this
strategy into an actual statistical test.
Here is some mock code using the "union" data file from Stata...
* use the data
use http://www.stata-press.com/data/r11/union.dta, clear
* idcode is like your hospital id
xtset idcode
* union is the outcome, age and south are level 1 predictors
xtreg union age south
* generate the level 2 residual, naming it r2
predict r2, u
* examine the residuals, for example using a histogram
hist r2
I know this is not a final solution, but I hope it is a useful
starting place.
Michael N. Mitchell
See the Stata tidbit of the week at...
http://www.MichaelNormanMitchell.com
Visit me on Facebook at...
http://www.facebook.com/MichaelNormanMitchell
On 2010-02-24 7.25 PM, David Souther wrote:
Hello Statalist:
I've got a dataset of individuals (id) in hospitals (hospitals) with
some individual level data (indiv_var1 and indiv_var2) as well as
hospital level data (hosp_var1 hosp_var2) similar to the data example
below. I'd like to use the indiv* and hosp* IVs to predict the binary
DV (dv). In the real dataset there are thousands of hospitals, and
hundreds of individuals per hospital.
What I am hoping to discover is those hospitals that have
significantly higher or significantly lower than expected
probabilities of the DV. So, I would like to somehow list those
hospitals that are the highest/lowest. I tried running a logit with
all these as IVs plus dummies for all the hospitals so that I could
use predict to find the difference between the predicted and the
actual values, but it drops all the dummy variables -->
logit dv indiv* hosp_var1 hosp_var2 hosp_dummies*
Also, I tried clogit but it said there was no variation in the
groups. As an alternative, could I just run regression and get the
components that go into the rvfplot (residual versus the fitted
points; if that makes any sense)?
Any other ideas on how to get the hospitals that are highest/lowest ?? Thanks.
**data**
input hosp id dv indiv_var1 indiv_var2 hosp_var1 hosp_var2
1 1 1 3 34 88 9
1 2 1 7 24 88 9
1 3 0 6 12 88 9
1 4 0 6 12 88 9
1 5 0 9 12 88 9
1 6 0 9 13 88 9
2 1 0 4 66 77 8
2 2 0 . 67 77 8
2 3 1 9 68 77 8
2 4 0 3 67 77 8
2 5 1 2 6 77 8
2 6 0 9 56 77 8
3 1 0 1 34 11 1
3 2 0 1 3 11 1
3 3 1 2 2 11 1
3 4 0 4 1 11 1
3 5 0 1 2 11 1
3 6 0 . 1 11 1
end
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/