|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: AW: Simulating stepwise regression
Thanks; this is really excellent and very kind of you to provide me with
such a detailed program. It works great.
Thank you and to Martin too....I have a nice combo of programs to play with.
John.
____________________________________________________
Prof. John Antonakis
Associate Dean Faculty of Business and Economics
University of Lausanne
Internef #618
CH-1015 Lausanne-Dorigny
Switzerland
Tel ++41 (0)21 692-3438
Fax ++41 (0)21 692-3305
Faculty page:
http://www.hec.unil.ch/people/jantonakis&cl=en
Personal page:
http://www.hec.unil.ch/jantonakis
____________________________________________________
On 07.08.2009 19:31, Tirthankar Chakravarty wrote:
Here is a way; maybe someone can suggest something which is not quite
this cumbersome. Note that since sort order is not important, you can
merge by an identifier based on row number.
An "id" variable is created in each simulation file. Once the
simulations are done, the files are merged. Then a -reshape- and a
-collapse- gets you where you want. I am assuming you want the mean
R^2 for each set of simulations. This can be changed.
****************************************
capture program drop sim
version 10
program define sim, rclass
drop _all
syntax , nreg(integer ) nobs(integer )
set obs `nobs'
forv i=1/`nreg' {
g x`i' = invnormal(uniform())
}
gen y = invnorm(uniform())
stepwise, pr(.2): regress y x*
return scalar r2d2 = e(r2)
end
foreach nobs of numlist 1000 1500 2000 {
forv nreg = 1(1)10 {
simulate r2d2=r(r2d2), reps(10000) ///
saving(sw_r2_`nobs'_`nreg'.dta, every(1) ///
replace) seed(123): sim, nreg(`nreg') ///
nobs(`nobs')
use sw_r2_`nobs'_`nreg'.dta
g id=_n
rename r2d2 r2_`nobs'_`nreg'
sort id
save sw_r2_`nobs'_`nreg'.dta, replace
}
}
/* presenting the results */
// merge the files; the last simulation is the
// file in memory.
foreach nobs of numlist 1000 1500 2000 {
forv nreg = 1(1)10 {
if "`c(filename)'" != "sw_r2_`nobs'_`nreg'.dta" {
// this checks if the last filename has been reached
merge id using sw_r2_`nobs'_`nreg', ///
_merge(identifier_`nobs'_`nreg') unique
sort id
}
else {
di in g "All done merging."
}
}
}
drop identifier*
// reshape the data to be cross-classified by no. of
// regressors and no. of observations
reshape long r2_1000_ r2_2000_ r2_1500_, i(id) j(numreg)
// get the means of the R^2 for each simulation
collapse (mean) r2*, by(numreg)
list, noobs
save simulations_merged_collapsed, replace
****************************************
T
On Fri, Aug 7, 2009 at 5:44 PM, John Antonakis<[email protected]> wrote:
Thanks Tirthankar!
I see that separate files are stored for each simulation. How could one
combine those results in one file?
Also, how would one generate a table (sample size on the horizontal and
number of predictors on the vertical) with the simulated r-squares?
Best,
J.
____________________________________________________
Prof. John Antonakis
Associate Dean Faculty of Business and Economics
University of Lausanne
Internef #618
CH-1015 Lausanne-Dorigny
Switzerland
Tel ++41 (0)21 692-3438
Fax ++41 (0)21 692-3305
Faculty page:
http://www.hec.unil.ch/people/jantonakis&cl=en
Personal page:
http://www.hec.unil.ch/jantonakis
____________________________________________________
On 07.08.2009 13:10, Tirthankar Chakravarty wrote:
You should probably use -simulate-. Here is what it might look like:
***********************************
capture program drop sim
version 10
program define sim, rclass
drop _all
syntax , nreg(integer ) nobs(integer )
set obs `nobs'
forv i=1/`nreg' {
g x`i' = invnormal(uniform())
}
gen y = invnorm(uniform())
stepwise, pr(.2): regress y x*
qui indeplist
return scalar r2d2 = e(r2)
end
/*
simulate for each of the regressor and
sample size combinations required.
10,000 replications.
*/
foreach nobs of numlist 1000 1500 2000 {
forv nreg = 1(1)10 {
simulate r2d2=r(r2d2), reps(10000) ///
saving(sw_r2_`nobs'_`nreg'.dta, every(1) ///
replace) seed(123): sim, nreg(`nreg') ///
nobs(`nobs')
}
}
use sw_r2_1000_5, clear
kdensity r2d2
***********************************************
On Fri, Aug 7, 2009 at 11:18 AM, John Antonakis<[email protected]>
wrote:
That's very helpful; thanks Martin.
To extend the below, how would I simulate the r-square? That is, I want
to
run the simulation say 100 times, and then obtain the mean r-square from
each simulation. Thus, I can show, at a specific sample size (n=100) and
number of independent variables (k=5), what the r-square would be just by
chance alone.
As an extension, is there a way to vary the sample size (n from 50 to
1000,
in increments of 50) and the number of independent variables (k=1 to
k=100
in increments of 1) in the simulation?
Best,
J.
____________________________________________________
Prof. John Antonakis
Associate Dean Faculty of Business and Economics
University of Lausanne
Internef #618
CH-1015 Lausanne-Dorigny
Switzerland
Tel ++41 (0)21 692-3438
Fax ++41 (0)21 692-3305
Faculty page:
http://www.hec.unil.ch/people/jantonakis&cl=en
Personal page:
http://www.hec.unil.ch/jantonakis
____________________________________________________
On 07.08.2009 12:06, Martin Weiss wrote:
<>
You could also -tokenize- the return from -indeplist- and have your
-program- return the regressors one by one...
*************
capt prog drop sim
version 10.1
program define sim, rclass
drop _all
set obs 100
gen y = invnorm(uniform())
gen x1 = invnorm(uniform())
gen x2 = invnorm(uniform())
gen x3 = invnorm(uniform())
gen x4 = invnorm(uniform())
gen x5 = invnorm(uniform())
stepwise, pr(.2): regress y x1-x5
qui indeplist
tokenize "`r(X)'"
ret loc one="`1'"
ret loc two="`2'"
ret loc three="`3'"
ret loc four="`4'"
ret loc five="`5'"
end
sim
ret li
*************
HTH
Martin
-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von John
Antonakis
Gesendet: Freitag, 7. August 2009 11:47
An: [email protected]
Betreff: st: Simulating stepwise regression
Hi:
I would like to simulate the below. Note, I am no fan of stepwise--I
just
want to demonstrate it evils
However, I do not know
1. what to put in the place of "??"--that is, I want the program to
capture only the variables that were selected in the model as being
significant
2. how to simulate the r-square.
3. how to extend the simulation (a new program) such that I simulate
from
n = 50 to n=1000 (in increments of 50), crossed with independent
variables
ranging from x1 to x100.
Regards,
John.
Here is the program:
set seed 123456
capture program drop sim
version 10.1
program define sim, eclass
drop _all
set obs 100
gen y = invnorm(uniform())
gen x1 = invnorm(uniform())
gen x2 = invnorm(uniform())
gen x3 = invnorm(uniform())
gen x4 = invnorm(uniform())
gen x5 = invnorm(uniform())
stepwise, pr(.2): regress y x1-x5
end
simulate ??? , reps(20) seed (123) : sim,
foreach v in ?? {
gen t_`v' = /*
*/_b_`v'/_se_`v'
gen p_`v' =/*
*/ 2*(1-normal(abs(t_`v')))
}
____________________________________________________
Prof. John Antonakis
Associate Dean Faculty of Business and Economics
University of Lausanne
Internef #618
CH-1015 Lausanne-Dorigny
Switzerland
Tel ++41 (0)21 692-3438
Fax ++41 (0)21 692-3305
Faculty page:
http://www.hec.unil.ch/people/jantonakis&cl=en
Personal page:
http://www.hec.unil.ch/jantonakis
____________________________________________________
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/