Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: tuples, stepwise and counting types of variables
From
Cameron McIntosh <[email protected]>
To
STATA LIST <[email protected]>
Subject
RE: st: tuples, stepwise and counting types of variables
Date
Mon, 13 Aug 2012 21:56:21 -0400
Nice to see that Nick is an active member of the "anti-stepwise regression club." In that regard, I might strongly suggest taking a look at:
Flom, P.L., & Cassell, D.L. (2007). Stopping stepwise: Why stepwise and similar selection methods are bad, and what you should use. NESUG 2007: Statistics and Data Analysis.
http://www.nesug.org/proceedings/nesug07/sa/sa07.pdf
Huberty, C. J. (1989). Problems with stepwise methods—Better alternatives. In B. Thompson (Ed.), Advances in social science methodology (Vol. 1, pp. 43–70). Greenwich, CT: JAI Press.
http://education.gsu.edu/coshima/EPRS8550/Oshima%20Problem.pdf
Thompson, B. (2001). Significance, Effect Sizes, Stepwise Methods, and Other Issues: Strong Arguments Move the Field. The Journal of Experimental Education, 70(1), 80-93.
http://web.me.com/rsbalkin/Site/Research_Methods_and_Statistics_files/Strong%20arguments%20move%20the%20field--Thompson.pdf
Thompson, B. (1995). Stepwise Regression and Stepwise Discriminant Analysis Need Not Apply here: A Guidelines Editorial. Educational and Psychological Measurement, 55(4), 525-534.
Thompson, B. (1989). Why won't stepwise methods die? Measurement and Evaluation in Counseling and Development, 21(4), 146-148.
http://web.me.com/rsbalkin/Site/Research_Methods_and_Statistics_files/why%20won't%20stepwise%20methods%20die.pdf
Some additional references are in the FAQ Nick mentioned. To be sure, I'm not against data mining in general.
Cam
> Date: Mon, 13 Aug 2012 21:33:01 -0400
> Subject: Re: st: tuples, stepwise and counting types of variables
> From: [email protected]
> To: [email protected]
>
> Thanks Nick
>
> My question is how do i generate the "used" list after using stepwise
> regression? Stepwise (or another automated variable selection method)
> decides which variables stay in the model. I've counted the number of
> variables in e(df_m), but i believe i need to save the actual names of
> the variables that stay in the regression to use your suggested
> approach.
>
> thanks again
> Thomas
> On Mon, Aug 13, 2012 at 8:36 PM, Nick Cox <[email protected]> wrote:
>> I can't comment on analogues to MAXR as I am not familiar with SAS.
>>
>> For counting how many of a list are in another list, you can find the
>> intersection of two lists using
>>
>> : list a & b
>>
>> as documented at -help macrolists-. and then count them.
>>
>> For example,
>>
>> local availablex "x1 x2 x3"
>> local usedx "x2"
>> local inter : list availablex & usedx
>> di `: word count `inter'
>>
>> Nick
>>
>> On Tue, Aug 14, 2012 at 1:24 AM, Thomas Sohnesen <[email protected]> wrote:
>>> Thanks Nick
>>>
>>> For this exercise i'm not interested in the coeffiicents or their
>>> meaning, i'm looking to find a parsimonouce model for predictions.
>>> Any advice on a better alternative than stepwise? Doing it manually
>>> is not really an option as we will be running a lot of different
>>> models. Further, though my data is organized in blocks i would like to
>>> keep single variables if they are highly correlated with my dependent
>>> variable. I believe SAS has an alernative in MAXR. Do you know if
>>> stata has a similar alternativ?
>>>
>>> Finally, no matter which alternativ we end up using, i still have the
>>> challange of counting number of variables from each block in the final
>>> model. Any insights on that?
>>>
>>> thanks and best
>>>
>>> Thomas
>>>
>>>
>>> On Mon, Aug 13, 2012 at 5:30 PM, Nick Cox <[email protected]> wrote:
>>>> I belong to a club which is dedicated to advising people against using
>>>> -stepwise-. A -search- will find an FAQ on this question.
>>>>
>>>> I'd look at -nestreg- instead.
>>>>
>>>> Nick
>>>>
>>>> On Mon, Aug 13, 2012 at 10:18 PM, Thomas Sohnesen <[email protected]> wrote:
>>>>
>>>>> I have a number of "groups" of variables as examplified below.
>>>>>
>>>>>
>>>>> local gr1 x1 x2 x3 x4
>>>>>
>>>>> local gr2 x5 x6 x7 x8
>>>>>
>>>>> local gr3 x9 x10 x11 x12 x13 x14 x15
>>>>>
>>>>> local gr4 x16 x17
>>>>>
>>>>>
>>>>>
>>>>> I run stepwise regressions for all the combinations of these groups
>>>>> using tuples.
>>>>>
>>>>> tuples "`gr1'" "`gr2'" "`gr3'" "`gr4'" , display
>>>>>
>>>>> forval i = 1/`ntuples' {
>>>>>
>>>>> qui stepwise, pr(0.05): regress y `tuple`i''
>>>>>
>>>>> }
>>>>>
>>>>>
>>>>>
>>>>> Now i would like to count how many variables from each group that
>>>>> stayed in the step wise model.
>>>>>
>>>>>
>>>>>
>>>>> For instance in the stepwise regression of gr1 and gr2 (ei x1 x2 x3
>>>>> x4 x5 x6 x7 x8) only x3 x4 x5 was included in the regression. I
>>>>> would then like an output along the lines of:
>>>>>
>>>>> Model Num_var_gr1 num_var_gr2 num_var_gr3 num_var_gr4
>>>>>
>>>>> gr2 gr3 1 2 0
>>>>> 0
>>>>>
>>>>> gr2 gr4
>>>>>
>>>>> gr1 gr2
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/