Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Looping over variables in more than one group
From
Joerg Luedicke <[email protected]>
To
[email protected]
Subject
Re: st: Looping over variables in more than one group
Date
Wed, 7 Mar 2012 08:00:52 -0800
You should probably rather think about what covariates make the most
sense to include with respect to your theory and research question.
Digging up variables to cook up good looking p-values and then
interpreting these p-values in the usual way is a questionable
endeavor, to say the least. However, if you are rather interested in
something like a prediction model, and not in hypothesis testing, you
could just use straight data mining techniques right away, for example
boosted regression (-findit boost-).
J.
On Wed, Mar 7, 2012 at 7:12 AM, jaweria seth <[email protected]> wrote:
> Thanks Nick,
> I understand this would result in a large number of models..
> however, I wouldn't be combining variables of the same category/group,
> as this would bring up the issue of multicollinearity.
> for example, I know for sure I need to add one variable each from
> groups 1 and 2. group 1 contains variables that measure the
> size/production of a business, and I am wondering which of those
> variables would be most significant in a multi-variate model. I am
> looking at t-stats in the regression output: if even one of the
> variables included is not significant at the 10%, that model gets
> dropped..( and as im running the regressions manually, i find that the
> majority of the combos are not significant).
>
> Does this make sense? If so, how can I implement it?
> The way I am doing it right now: Holding one variable from group2
> constant and looping through group 1/size variables to find
> significance. however, this gets tricky when I try to include a third
> variable.
>
>
> Thanks,
>
> On Wed, Mar 7, 2012 at 2:34 AM, Nick Cox <[email protected]> wrote:
>> Before you even think of how to implement this, do the combinatorics
>> of how many models this implies.
>>
>> So, for example,
>>
>> . di 30^4
>> 810000
>>
>> . di 5^4
>> 625
>>
>> Then bump up those numbers adding in the null choices, i.e. no
>> variable from each group, as well.
>>
>> So you would need not only to do the looping but to ponder what it
>> implies in terms of gathering results from thousands of models,
>> finding the "best", whatever that means, including the implications
>> for how you think about the resulting P-values, etc.
>>
>> Nick
>>
>> On Tue, Mar 6, 2012 at 10:01 PM, jaweria seth <[email protected]> wrote:
>>
>>> I would like to run regressions with up to 4 different variables. My
>>> variables are separated into 4 groups with 5-30 variables in each
>>> group. I would like to run regression combos of different variables to
>>> find the best model:
>>> How do I regress my y variable on 1 variable from group 1 and 1 from
>>> group 2 and loop through different combos of each?
>>> for ex:
>>> regress Yvariable Group1 Group2
>>>
>>> Then I would like to add a variable from group 3, and so on..
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/