Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: RE: st: Splitting a dataset efficiently/run regression repeatedly in subsets

From	"Trelle Sven" <[email protected]>
To	<[email protected]>
Subject	RE: RE: st: Splitting a dataset efficiently/run regression repeatedly in subsets
Date	Tue, 16 Nov 2010 16:43:27 +0100

suggestion by Kit Baum 
> Not true. Statsby can execute any single command, and you can 
> easily create a 'wrapper' command that does both the regress 
> and the predict. See my interchange with Degas Wright re 
> 'rolling' on Statalist earlier this month, and the example 
> myregress.ado in the ITSP sample programs. That command does 
> a lincom after regress, but that is very similar to a predict 
> after regress.

Thanks for this. I finally managed to to get a program running (the
difference between lincom and predict in my understanding is that
predict writes its results in a new variable. In my program I create a
temporary variable for this. However, it took me some while to figure
out how to actually access the individual predictions in the temporary
variable. An example is given below - although there are probably much
better approaches to deal with it ...).
I haven't tested yet how much faster statsby is as compared to using an
if- or in-statement ...

Thread is closed from my side.

Thank you all very much for your help 
Sven



Originial problem: 
Perform 3 univariate regressions plus prediction for a large number of
simulations (50,000). Using an if-statement is not efficient (takes
hours). Solution: a wrapper program combining regress and predict in one
program to be used with statsby

* Example starts
* example dataset, only 3 simulations (variable run) included instead of
50000
clear
input run trt logrr logcox2ratio propcox1_80prozcox2
lnoddscox1_80prozcox2	
1 2 2.408 .5 .89144216 2.1055574
1 3 .6371 .42857143 .88114105 2.0032802
1 4 .6164 -.64285714 .66957211 .70625041
1 5 -.0616 -.69047619 .38034865 -.48806864
1 6 1.296 -2.2 .10063391 -2.1902008
1 7 1.248 -2.7142857 .02931854 -3.4997782
1 8 1.286 -3.0238095 .01901743 -3.9431986
1 9 .09297 -3.6904762 .001 -6.9067548
2 2 2.608 .5 .89144216 2.1055574
2 3 1.295 .42857143 .88114105 2.0032802
2 4 .5836 -.64285714 .66957211 .70625041
2 5 -.3568 -.69047619 .38034865 -.48806864
2 6 1.683 -2.2 .10063391 -2.1902008
2 7 .1725 -2.7142857 .02931854 -3.4997782
2 8 1.045 -3.0238095 .01901743 -3.9431986
2 9 .2751 -3.6904762 .001 -6.9067548
3 2 2.065 .5 .89144216 2.1055574
3 3 1.07 .42857143 .88114105 2.0032802
3 4 .2483 -.64285714 .66957211 .70625041
3 5 -.2541 -.69047619 .38034865 -.48806864
3 6 1.21 -2.2 .10063391 -2.1902008
3 7 -.6535 -2.7142857 .02931854 -3.4997782
3 8 1.113 -3.0238095 .01901743 -3.9431986
3 9 .2599 -3.6904762 .001 -6.9067548
end


	capture program drop regpred
	program define regpred, rclass
		version 10.1
		syntax [if]
		marksample touse
		quietly count if `touse'
		if `r(N)' == 0 {
			error 2000
		}
		tempvar prediction1 prediction2 prediction3 obs
		local res coeff1 pred1_2 pred1_3 pred1_4 pred1_5 pred1_6
pred1_7 pred1_8 pred1_9 coeff2 pred2_2 pred2_3 pred2_4 pred2_5 pred2_6
pred2_7 pred2_8 pred2_9 coeff3 pred3_2 pred3_3 pred3_4 pred3_5 pred3_6
pred3_7 pred3_8 pred3_9
		tempname `res'
		
		sort run trt
		* next 5 lines needed to access results of prediction
		gen `obs' = _n
		qui sum `obs' if `touse'
		local start = r(min)
		local stop = r(max)
		local runlength = `stop' - `start' + 1
		
		if `runlength'!=8 {
			dis "Error, run!=8" // program only works for
run==8, local res would need to be generic otherwise !
		}
		else {
			qui regress  logrr logcox2ratio if `touse'
			scalar `coeff1' = _b[logcox2ratio]
			qui predict `prediction1' if `touse', xb
			local trt = 2
			forval s=`start'/`stop' {
				scalar `pred1_`trt'' =
`prediction1'[`s']
				local trt = `trt' + 1
			}
			
			qui regress  logrr propcox1_80prozcox2 if
`touse'
			scalar `coeff2' = _b[propcox1_80prozcox2]
			qui predict `prediction2' if `touse', xb
			local trt = 2
			forval s=`start'/`stop' { 
				scalar `pred2_`trt'' =
`prediction2'[`s']
				local trt = `trt' + 1
			}
			
			qui regress  logrr lnoddscox1_80prozcox2 if
`touse'
			scalar `coeff3' = _b[lnoddscox1_80prozcox2]
			qui predict `prediction3' if `touse', xb
			local trt = 2
			forval s=`start'/`stop' { 
				scalar `pred3_`trt'' =
`prediction3'[`s']
				local trt = `trt' + 1
			}
			
			foreach r of local res {
				return scalar `r' = ``r''
			}
		}
	end
	

statsby coeff1=r(coeff1) pred1_2=r(pred1_2) pred1_3=r(pred1_3)
pred1_4=r(pred1_4) pred1_5=r(pred1_5) pred1_6=r(pred1_6)
pred1_7=r(pred1_7) pred1_8=r(pred1_8) pred1_9=r(pred1_9)
coeff2=r(coeff2) pred2_2=r(pred2_2) pred2_3=r(pred2_3)
pred2_4=r(pred2_4) pred2_5=r(pred2_5) pred2_6=r(pred2_6)
pred2_7=r(pred2_7) pred2_8=r(pred2_8) pred2_9=r(pred2_9)
coeff3=r(coeff3) pred3_2=r(pred3_2) pred3_3=r(pred3_3)
pred3_4=r(pred3_4) pred3_5=r(pred3_5) pred3_6=r(pred3_6)
pred3_7=r(pred3_7) pred3_8=r(pred3_8) pred3_9=r(pred3_9), by(run)
saving(C:\@_temp\statsby, replace): regpred


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- re: RE: st: Splitting a dataset efficiently/run regression repeatedly in subsets
  - From: Christopher F Baum <[email protected]>

Prev by Date: st: RE: effect size in nonlinear regression
Next by Date: st: tochastic frontier and weight
Previous by thread: re: RE: st: Splitting a dataset efficiently/run regression repeatedly in subsets
Next by thread: st: Which best model I must use?ARIMA is not good
Index(es):
- Date
- Thread