|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: How low can the percentage of uncensored cases be in heckprob?
From |
Steven Samuels <[email protected]> |
To |
[email protected] |
Subject |
Re: st: How low can the percentage of uncensored cases be in heckprob? |
Date |
Tue, 11 Nov 2008 16:17:48 -0500 |
I am not familiar with -heckprob-, but I doubt if the *percent* of
uncensored observations matters much.
-heckprob- fits two probit models. I know of results related to
Margaret's question only for logit models. For a single logistic
regression model, the relevant sample size is the smaller of the
number of events or non-events. Peduzzi et al. (1996) showed that the
ratio of this number to the number of predictors should be at least
15:1 to avoid bias from over-fitting.
-Steve
Refs:
Peduzzi PN, Concato J, Holford TR, Feinstein AR. (1995) The
importance of events per independent variable in multivariable
analysis, II: accuracy and precision of regression estimates. J Clin
Epidemiol; 48: 1503–10.
Peduzzi PN, Concato J, Kemper E, Holford TR, Feinstein AR. (1996) A
simulation study of the number of events per variable in logistic
regression analysis. J Clin Epidemiol; 49: 1373–9.
M Babyak. (2004) What You See May Not Be What You Get: A Brief,
Nontechnical Introduction to Overfitting in Regression-Type Models.
Psychosomatic Medicine 66:411-421. Full text:
http://www.psychosomaticmedicine.org/cgi/content-nw/full/66/3/411/
On Nov 11, 2008, at 12:15 PM, Maarten buis wrote:
--- "Tyler, Margaret C D" <[email protected]> wrote:
In the example in the Stata reference -H heckprob, there are 95 total
and 59 uncensored observations, so 62% are uncensored. In my own
situation I have only about 19% uncensored. Is it still appropriate
to use heckprob for my analysis? I have run the equations and gotten
what seem to be valid results. rho is non-significant.
You are obviously pushing your luck with that many censored cases. It
is no longer very popular to make statements like you need at least N
observation or p% uncensored cases for technique t to be appropriate
(whatever appropriate may mean). So I don't think you will get the
answer you are looking for. However, what you can do is run some
simulations and see how well (or bad) your estimator behaves with a
small number of uncensored cases. At the last Summer North American
Stata Users' Group meeting I gave a talk on using Stata for doing this
type of simulations, you can get the materials from:
http://ideas.repec.org/p/boc/nsug08/14.html
Hope this helps,
Maarten
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting address:
Buitenveldertselaan 3 (Metropolitan), room N515
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/