Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Best Logistic Regression Model
From
Austin Nichols <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Best Logistic Regression Model
Date
Wed, 19 Mar 2014 10:05:14 -0400
T A <[email protected]> :
You should begin your analysis plan by clarifying your goals.
First, are you pursuing a classification model or trying to describe
impacts of X on y?
You can choose significant predictors using univariate analysis first,
but you will introduce bias.
You can pick a model that predicts best in sample, but there is no
guarantee it will work out of sample, or that it will measure any
causal connections between variables.
On Wed, Mar 19, 2014 at 9:27 AM, Nick Cox <[email protected]> wrote:
> Thanks for the mention of -allpossible- (SSC), but some warnings are in order.
>
> That program really is limited to 6 predictors. As of 2014, I don't
> imagine ever revising it. In the OP's case 20 predictors mean 2^20
> possible models and that's a million and more to think about.
>
> A paragraph in the help file really does mean what it says
>
> "Naturally, this command does not purport to replace the detailed
> scrutiny of individual models or to offer an unproblematic way of
> finding
> "best" models. Its main use may lie in demonstrating that several
> models exist within many projects possessing roughly equal merit as
> measured by omnibus statistics."
>
> 6 by the way was not an arbitrary choice for me as programmer. A
> former graduate student had 6 predictors, all on the same footing, and
> looking at _all_ the 64 possible models was reasonable and natural for
> that project. But 6 is an arbitrary limit for everyone else.
>
> For exploration of different predictor sets, -tuples- (SSC) may be of
> some help, but all it does is put tuples of variable names into local
> macros.
>
>
> Nick
> [email protected]
>
>
> On 19 March 2014 13:47, Richard Williams <[email protected]> wrote:
>> Ideally you have some great theory which helps you pick predictors. You then
>> test whether the theory seems to be right. The -nestreg- command can let you
>> test a hierarchy of models.
>>
>> But if you are going into this totally blind...
>>
>> Check out -help stepwise- for info on how to do stepwise regression. But
>> first, read this brief discussion of the problems with stepwise:
>>
>> http://www.stata.com/support/faqs/statistics/stepwise-regression-problems/
>>
>> If you want to do stepwise anyway, you may want to do things like split the
>> sample randomly in two. Develop your model with one data set and then see if
>> you can confirm it with the other.
>>
>> If you want to mass produce models, check out Nick Cox's -allpossible-,
>> available from SSC.
>>
>> To get BIC and AIC tests, you can use commands like
>>
>> sysuse auto
>> logit foreign weight
>> estat ic
>> est store m1
>> logit foreign weight mpg
>> est store m2
>> lrtest m1 m2, stats
>>
>> You might also check out this Stata tip:
>> http://www.stata-journal.com/sjpdf.html?articlenum=dm0032
>>
>> As for searching previous questions, the search info appears at the end of
>> every email that gets posted to the list.
>>
>> At 05:51 AM 3/19/2014, T A wrote:
>>>
>>> Hi,
>>>
>>> I am writing an analysis plan for a very large dataset. My outcome is
>>> binary. I have data on 10,000 patients. I need to comment on which
>>> logistic regression model I would use, i.e. forward elimination,
>>> backward elimination, stepwise etc. How do I go about choosing the
>>> best logistic regression model? I know I can choose significant
>>> predictors using univariate analysis first. Since the dataset is so
>>> large and there are only 20 variables to look at, I think all
>>> variables could have a singificant p value. Is there a more systematic
>>> and stringent way of choosing predictors for a multivariable logistic
>>> regression? How do I do AIC and BIC in STATA?
>>>
>>> Sorry if this is a silly question. I am a newbie to stats. Thank you
>>> so much for your help.
>>>
>>> How do I search all the previous questions that has been asked on this
>>> mailing list?
>>>
>>> Best Regards
>>> Ta
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/