Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Computing the proportion of significant variables after running numerous regressions
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: Computing the proportion of significant variables after running numerous regressions
Date
Mon, 14 May 2012 11:43:25 +0100
I think it depends what George wants by way of standard errors. If you run -bootstrap: regress- the effect is not just to add the confidence intervals.
Nick
[email protected]
Phil Clayton
I don't see the problem Nick - I think your code reports the correct values. -bootstrap- reports the same beta coefficients as -regress- since these are the best (least biased) point estimates, and otherwise the estimates that your code extracts seem to come from the bootstrapping as desired.
I completely agree that 10 repetitions is not enough - my example was only designed to demonstrate the use of -post- - but thanks for pointing it out.
Phil
On 14/05/2012, at 7:15 PM, Nick Cox wrote:
> No, you (and I) need to be more circumspect. After -bootstrap:
> regress- the results in memory are a mix of results for -bootstrap-
> and for the last replication of -regress-. So, you need to separate
> that out in your code.
>
> On Mon, May 14, 2012 at 9:52 AM, Nick Cox <[email protected]> wrote:
>> You seem to be guessing that after -bootstrap: regress- there is a
>> quantity left in memory called -_ci_bc_cons-. Not so. Also, each
>> confidence interval is a pair of numbers, so you need to create two
>> variables to hold it, not one. The trick to these calculations is to
>> see what is left in memory after a command. By the way, 10
>> replications would not be enough for most serious work.
>>
>> * load dataset
>> sysuse auto, clear
>>
>> * set up temporary file for results
>> tempfile results
>> tempname postfile
>> postfile `postfile' foreign _b_cons _se_cons _b_mpg _se_mpg _cons_ll
>> _cons_ul _b_ll _b_ul using "`results'"
>>
>> * run bootstrapped regression for each level of foreign
>> set seed 1 // so that you can repeat your analysis
>> levelsof foreign, local(levels)
>> foreach level of local levels {
>> bootstrap, rep(10): regress price mpg if foreign==`level'
>> mat ci = e(ci_bc)
>> post `postfile' (`level') (_b[_cons]) (_se[_cons]) (_b[mpg])
>> (_se[mpg]) (ci[1,2]) (ci[2,2]) (ci[1,1]) (ci[2,1])
>> }
>> postclose `postfile'
>>
>> * display results
>> use "`results'", clear
>> list
>>
>>
>> On Mon, May 14, 2012 at 9:30 AM, George Murray
>> <[email protected]> wrote:
>>> Phil,
>>>
>>> Thank you so much for your help, this worked perfectly.
>>>
>>> I have one more query, however.
>>>
>>> I also need a vector of the bias-corrected confidence intervals (which
>>> can be obtained with the -estat bootstrap- command). I replace two of
>>> the commands you suggested with these two commands as follows:
>>>
>>> -postfile `postfile' foreign _b_cons _se_cons _ci_bc_cons _b_mpg
>>> _se_mpg using "`results'"- .............(all I did was add
>>> "_ci_bc_cons")
>>>
>>> -post `postfile' (`level') (_b[_cons]) (_se[_cons]) (_ci_bc[_cons])
>>> (_b[mpg]) (_se[mpg])- .............(all I did was add
>>> "(_ci_bc[_cons])")
>>>
>>> and I also wrote -estat boostrap- after the bootstrap, rep(10)... command
>>>
>>> However, I get the following error:
>>>
>>> _ci_bc not found
>>> post: above message corresponds to expression 3, variable _ci_bc_cons
>>> r(111);
>>>
>>> Does anyone know how to solve this problem?
>>
>>
>> On Mon, May 14, 2012 at 12:05 AM, Phil Clayton
>>> <[email protected]> wrote:
>>>> George,
>>>>
>>>> There are various ways to do this. One is to use -post- after each bootstrapped regression to store the results of that regression in a "results" dataset, similar to a Monte Carlo simulation. You can then access the results dataset and manipulate it however you like.
>>>>
>>>> Here's a basic example that uses the auto dataset and loops over the levels of "foreign" (ie 0 and 1), runs a bootstrapped regression of price on mpg for each level, and displays the resulting coefficients and standard errors.
>>>>
>>>> --------- begin example ---------
>>>> * load dataset
>>>> sysuse auto, clear
>>>>
>>>> * set up temporary file for results
>>>> tempfile results
>>>> tempname postfile
>>>> postfile `postfile' foreign _b_cons _se_cons _b_mpg _se_mpg using "`results'"
>>>>
>>>> * run bootstrapped regression for each level of foreign
>>>> set seed 1 // so that you can repeat your analysis
>>>> levelsof foreign, local(levels)
>>>> foreach level of local levels {
>>>> bootstrap, rep(10): regress price mpg if foreign==`level'
>>>> post `postfile' (`level') (_b[_cons]) (_se[_cons]) (_b[mpg]) (_se[mpg])
>>>> }
>>>> postclose `postfile'
>>>>
>>>> * display results
>>>> use "`results'", clear
>>>> list
>>>> --------- end example ---------
>>>>
>>>> Since you're running ~1000 models you may wish to change "foreach" to "qui foreach", and monitor the iterations using the _dots command (see Harrison DA. Stata tip 41: Monitoring loop iterations. Stata Journal 2007;7(1):140, available at http://www.stata-journal.com/article.html?article=pr0030)
>>>>
>>>> Phil
>>>>
>>>>
>>>> On 13/05/2012, at 10:06 PM, George Murray wrote:
>>>>
>>>>> Dear Statalist,
>>>>>
>>>>> I am using the -foreach- command to run approximately 1000
>>>>> (bootstrapped) regression models, however I require an efficient way
>>>>> of calculating the proportion of the regression models which have a
>>>>> statistically significant constant at the 5% level; and of the
>>>>> constants which are statistically significant, the proportion which
>>>>> are positive. Below each of the 1000 regressions I run, a table is
>>>>> displayed with the following format:
>>>>>
>>>>> ---------------------------------------------------------------------------------------------------
>>>>> | Observed Bootstrap
>>>>> V0 | Coef. Bias Std. Err.
>>>>> [95% Conf. Interval]
>>>>> -------------+------------------------------------------------------------------------------------
>>>>> V1 | .00968169 -.0000537 .00057051 .008721 .0111218 (BC)
>>>>> V2 | -.00110469 .0000782 .000691 -.0023101 .000459 (BC)
>>>>> V3 | .00468313 -.0001562 .00084971 .0031954 .0064538 (BC)
>>>>> _cons | -.00076976 .0001811 .00176677 -.0044496 .0025584 (BC)
>>>>> --------------------------------------------------------------------------------------------------
>>>>>
>>>>> I would be *very* grateful if someone knew the commands which would
>>>>> allow me calculate this. In the past, I have used (a highly tedious
>>>>> and embarrassing approach on) Excel where I filtered every Nth row,
>>>>> and wrote a command to display 1 if the coefficient lies within the
>>>>> confidence interval, and 0 if not. This time, however, I am running
>>>>> numerous models and require a quicker approach.
>>>>>
>>>>> One more question -- is there a way to create a new variable where the
>>>>> coefficients of V1 (for example) are saved, so I can calculate the
>>>>> mean, standard deviation etc.of V1?
>>>>>
>>>>> If someone could answer at least one of these two questions, it would
>>>>> be very much appreciated.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/