Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: do file: t-score, dfuller, to sw regress
From
Steven Samuels <[email protected]>
To
[email protected]
Subject
Re: st: RE: do file: t-score, dfuller, to sw regress
Date
Thu, 9 Dec 2010 22:24:53 -0500
I forgot to add Stata's own page: http://www.stata.com/support/faqs/stat/stepwise.html
. Screening the variables as you did just makes matters worse.
Steve
On Dec 9, 2010, at 10:12 PM, Steven Samuels wrote:
Here are just a few references, containing others, culled from a quick
Google search for "stepwise selection problems bootstrap". If I
recall, Gail Gong studied a strategy very much like yours, although
for logistic regression. Frank Harrell's book "Regression Modeling
Strategies" is a good resource for alternative strategies.
Steve
B Efron and G Gong (1983) A leisurely look at the boostrap, the
jackknife, and cross-validation. Am Stat 37, 36-48
Gail Gong, 1986, Cross--validation, the jackknife, and the boostrap,
Excess error in forward logistic regression, JASA 81, 108-113.
Peter C. Austina, Jack V. Tua Automated variable selection methods for
logistic regression produced unstable models for predicting acute
myocardial infarction mortality Journal of Clinical Epidemiology 57
(2004) 1138–1146 http://uncwddas.googlecode.com/files/article2.pdf
Derksen S. and Keselman, H. J. ‘Backward, forward and stepwise
automated subset selection algorithms: Frequency of obtaining
authentic and noise variables’, British Journal of Mathematical and
Statistical Psychology, 45, 265-282 (1992).
Frank E. Harrell Jr., Kerry L. Lee And Daniel B. Mark . Tutorial In
Biostatistics. Multivariable Prognostic Models: Issues In Developing
Models, Evaluating Assumptions And Adequacy, And Measuring And
Reducing Errors. Statistics In Medicine, Vol. 15,361-387 (1996) http://www.unt.edu/rss/class/Jon/MiscDocs/Harrell_1996.pdf
On Dec 9, 2010, at 3:13 PM, steven quattry wrote:
Thank you Nick for your comments, and apologies to all for being
unclear. I fully understand if this leads many to ignore my original
post. However if I may re-attempt to explain, essentially I have a
do-file created with the help of Statlist contributors that performs
bi-variate regressions, sorts the independent variables by t-score
and removes those below a certain threshold. It then runs a Dfuller
test and further removes variables that do not pass the critical
level, and finally there is code that essentially removes any
variables that have blanks. I would like to be able to learn of a way
to then take this output and sort the resulting variables by t-score,
then keep only the 72 variables with the highest t-score, and run a sw
regress with those variables. My current code is below. Again, I
sincerely apologize for being unclear and would appreciate any
feedback but understand if I do not receive any.
Also Nick, I assume you do not have the time to go into the
spuriousness of the above process, but if you were able to direct me
to a certain chapter in a well known stats text, or even an online
resource I would be quite thankful, however I fully understand it is
not your role.
Thank you for your consideration,
-Steven
I am using Stata/SE 11.1 for Windows
* 2.1 T-test and Dickey-Fuller Filter
**************************************
drop if n<61
tsset n
tempname memhold
tempname memhold2
postfile `memhold' str20 var double t using t_score, replace
postfile `memhold2' str20 var2 double df_pvalue using df_pvalue,
replace
foreach var of varlist swap1m-allocglobal uslib1m-infdify
dswap1m-dallocglobal6 {
qui reg dhealth `var'
matrix e =e(b)
matrix v = e(V)
local t = abs(e[1,1]/sqrt(v[1,1]))
if `t' < 1.7 {
drop `var'
}
else {
local mylist "`mylist' `var'"
post `memhold' ("`var'") (`t')
}
}
postclose `memhold'
foreach l of local mylist {
qui dfuller `l', lag(1)
if r(p) > .01 {
drop `l'
}
else {
local mylist2 "`mylist2' `l'"
post `memhold2' ("`l'") (r(p))
}
}
postclose `memhold2'
keep `mylist2'
log on
use t_score,clear
gsort -t
l
use df_pvalue, clear
l
log off
restore
* 2.2 Missing data Filter
**************************
preserve
drop if n<61
foreach x of varlist `mylist2' {
qui sum `x'
if r(N)<72 {
di in red "`x'"
drop `x'
}
else {
local myvar "`myvar' `x'"
}
}
sum date
keep if date==r(max)
foreach x of varlist `myvar' {
if `x'==. {
drop `x'
}
else {
local myvar2 "`myvar2' `x'"
}
}
log on
d `myvar2'
log off
restore
* 2.3 Stepwise Regressions
***************************
preserve
drop if n<61
*Simultaneous Model
local x "Here is where I paste in variables after sorting by
t-score and keeping only 72 highest"
log on
sw reg dhealth `x', pe(0.05)
vif
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/