Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: do file: t-score, dfuller, to sw regress

From   Steven Samuels <[email protected]>
To   [email protected]
Subject   Re: st: RE: do file: t-score, dfuller, to sw regress
Date   Thu, 9 Dec 2010 22:24:53 -0500

I forgot to add Stata's own page: . Screening the variables as you did just makes matters worse.


On Dec 9, 2010, at 10:12 PM, Steven Samuels wrote:

Here are just a few references, containing others, culled from a quick Google search for "stepwise selection problems bootstrap". If I recall, Gail Gong studied a strategy very much like yours, although for logistic regression. Frank Harrell's book "Regression Modeling Strategies" is a good resource for alternative strategies.


B Efron and G Gong (1983) A leisurely look at the boostrap, the jackknife, and cross-validation. Am Stat 37, 36-48

Gail Gong, 1986, Cross--validation, the jackknife, and the boostrap, Excess error in forward logistic regression, JASA 81, 108-113.
Peter C. Austina, Jack V. Tua Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality Journal of Clinical Epidemiology 57 (2004) 1138–1146

Derksen S. and Keselman, H. J. ‘Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables’, British Journal of Mathematical and Statistical Psychology, 45, 265-282 (1992).

Frank E. Harrell Jr., Kerry L. Lee And Daniel B. Mark . Tutorial In Biostatistics. Multivariable Prognostic Models: Issues In Developing Models, Evaluating Assumptions And Adequacy, And Measuring And Reducing Errors. Statistics In Medicine, Vol. 15,361-387 (1996)

On Dec 9, 2010, at 3:13 PM, steven quattry wrote:

Thank you Nick for your comments, and apologies to all for being
unclear.  I fully understand if this leads many to ignore my original
post.  However if I may re-attempt to explain, essentially I have a
do-file created with the help of Statlist contributors that performs
bi-variate regressions, sorts the  independent  variables by t-score
and removes those below a certain threshold.  It then runs a Dfuller
test and further removes variables that do not pass the critical
level, and finally there is code that essentially removes any
variables that have blanks.  I would like to be able to learn of a way
to then take this output and sort the resulting variables by t-score,
then keep only the 72 variables with the highest t-score, and run a sw
regress with those variables.  My current code is below.  Again, I
sincerely apologize for being unclear and would appreciate any
feedback but understand if I do not receive any.

Also Nick, I assume you do not have the time to go into the
spuriousness of the above process, but if you were able to direct me
to a certain chapter in a well known stats text, or even an online
resource I would be quite thankful, however I fully understand it is
not your role.

Thank you for your consideration,

I am using Stata/SE 11.1 for Windows

* 2.1 T-test and Dickey-Fuller Filter

  drop if n<61

  tsset n
	tempname memhold
  tempname memhold2
  postfile `memhold' str20 var  double t using t_score, replace
postfile `memhold2' str20 var2 double df_pvalue using df_pvalue, replace

  foreach var of varlist swap1m-allocglobal uslib1m-infdify
dswap1m-dallocglobal6 {
      qui reg dhealth `var'
      matrix e =e(b)
      matrix v = e(V)
      local t = abs(e[1,1]/sqrt(v[1,1]))
		if `t' < 1.7 {
			drop `var'
		else {
			local mylist "`mylist' `var'"
			post `memhold' ("`var'") (`t')
  postclose `memhold'

  foreach l of local mylist {
	   qui dfuller `l', lag(1)
	   if r(p) > .01 {
	       drop `l'
	   else {
	       local mylist2 "`mylist2' `l'"
	       post `memhold2' ("`l'") (r(p))
  postclose `memhold2'
  keep `mylist2'
log on
  use t_score,clear
  gsort -t
  use df_pvalue, clear
log off

* 2.2 Missing data Filter
  drop if n<61

  foreach x of varlist `mylist2' {
      qui sum `x'
          if r(N)<72 {
              di in red "`x'"
              drop `x'
          else {
              local myvar "`myvar' `x'"

  sum date
  keep if date==r(max)

  foreach x of varlist `myvar' {
      if `x'==. {
          drop `x'
      else {
          local myvar2 "`myvar2' `x'"
log on
d `myvar2'
log off

* 2.3 Stepwise Regressions

  drop if n<61

*Simultaneous Model
  local x "Here is where I paste in variables after sorting by
t-score and keeping only 72 highest"

log on
  sw reg dhealth `x', pe(0.05)

*   For searches and help try:

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index