Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Best Logistic Regression Model
From
T A <[email protected]>
To
[email protected]
Subject
Re: st: Best Logistic Regression Model
Date
Wed, 19 Mar 2014 16:08:31 +0000
Thank you everyone for your useful feedback. I am trying to describe
impact of X on Y. I would rather not use stepwise (especially after
reading the link). Nested regression sounds a good idea. I was
wondering if there is any model particularly suited for large
datasets? Or is it trial and error until we find the best model? I
need to clearly state the method/s I will use to analyse the data in
the analysis plan a-priori.
Is there any STATA package to split the data and do cross-validation?
Many thanks for your help.
On Wed, Mar 19, 2014 at 2:05 PM, Austin Nichols <[email protected]> wrote:
> T A <[email protected]> :
> You should begin your analysis plan by clarifying your goals.
> First, are you pursuing a classification model or trying to describe
> impacts of X on y?
> You can choose significant predictors using univariate analysis first,
> but you will introduce bias.
> You can pick a model that predicts best in sample, but there is no
> guarantee it will work out of sample, or that it will measure any
> causal connections between variables.
>
>
> On Wed, Mar 19, 2014 at 9:27 AM, Nick Cox <[email protected]> wrote:
>> Thanks for the mention of -allpossible- (SSC), but some warnings are in order.
>>
>> That program really is limited to 6 predictors. As of 2014, I don't
>> imagine ever revising it. In the OP's case 20 predictors mean 2^20
>> possible models and that's a million and more to think about.
>>
>> A paragraph in the help file really does mean what it says
>>
>> "Naturally, this command does not purport to replace the detailed
>> scrutiny of individual models or to offer an unproblematic way of
>> finding
>> "best" models. Its main use may lie in demonstrating that several
>> models exist within many projects possessing roughly equal merit as
>> measured by omnibus statistics."
>>
>> 6 by the way was not an arbitrary choice for me as programmer. A
>> former graduate student had 6 predictors, all on the same footing, and
>> looking at _all_ the 64 possible models was reasonable and natural for
>> that project. But 6 is an arbitrary limit for everyone else.
>>
>> For exploration of different predictor sets, -tuples- (SSC) may be of
>> some help, but all it does is put tuples of variable names into local
>> macros.
>>
>>
>> Nick
>> [email protected]
>>
>>
>> On 19 March 2014 13:47, Richard Williams <[email protected]> wrote:
>>> Ideally you have some great theory which helps you pick predictors. You then
>>> test whether the theory seems to be right. The -nestreg- command can let you
>>> test a hierarchy of models.
>>>
>>> But if you are going into this totally blind...
>>>
>>> Check out -help stepwise- for info on how to do stepwise regression. But
>>> first, read this brief discussion of the problems with stepwise:
>>>
>>> http://www.stata.com/support/faqs/statistics/stepwise-regression-problems/
>>>
>>> If you want to do stepwise anyway, you may want to do things like split the
>>> sample randomly in two. Develop your model with one data set and then see if
>>> you can confirm it with the other.
>>>
>>> If you want to mass produce models, check out Nick Cox's -allpossible-,
>>> available from SSC.
>>>
>>> To get BIC and AIC tests, you can use commands like
>>>
>>> sysuse auto
>>> logit foreign weight
>>> estat ic
>>> est store m1
>>> logit foreign weight mpg
>>> est store m2
>>> lrtest m1 m2, stats
>>>
>>> You might also check out this Stata tip:
>>> http://www.stata-journal.com/sjpdf.html?articlenum=dm0032
>>>
>>> As for searching previous questions, the search info appears at the end of
>>> every email that gets posted to the list.
>>>
>>> At 05:51 AM 3/19/2014, T A wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am writing an analysis plan for a very large dataset. My outcome is
>>>> binary. I have data on 10,000 patients. I need to comment on which
>>>> logistic regression model I would use, i.e. forward elimination,
>>>> backward elimination, stepwise etc. How do I go about choosing the
>>>> best logistic regression model? I know I can choose significant
>>>> predictors using univariate analysis first. Since the dataset is so
>>>> large and there are only 20 variables to look at, I think all
>>>> variables could have a singificant p value. Is there a more systematic
>>>> and stringent way of choosing predictors for a multivariable logistic
>>>> regression? How do I do AIC and BIC in STATA?
>>>>
>>>> Sorry if this is a silly question. I am a newbie to stats. Thank you
>>>> so much for your help.
>>>>
>>>> How do I search all the previous questions that has been asked on this
>>>> mailing list?
>>>>
>>>> Best Regards
>>>> Ta
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/