Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: RE: st: Splitting a dataset efficiently/run regression repeatedly in subsets
From
"Trelle Sven" <[email protected]>
To
<[email protected]>
Subject
RE: RE: st: Splitting a dataset efficiently/run regression repeatedly in subsets
Date
Tue, 16 Nov 2010 16:43:27 +0100
suggestion by Kit Baum
> Not true. Statsby can execute any single command, and you can
> easily create a 'wrapper' command that does both the regress
> and the predict. See my interchange with Degas Wright re
> 'rolling' on Statalist earlier this month, and the example
> myregress.ado in the ITSP sample programs. That command does
> a lincom after regress, but that is very similar to a predict
> after regress.
Thanks for this. I finally managed to to get a program running (the
difference between lincom and predict in my understanding is that
predict writes its results in a new variable. In my program I create a
temporary variable for this. However, it took me some while to figure
out how to actually access the individual predictions in the temporary
variable. An example is given below - although there are probably much
better approaches to deal with it ...).
I haven't tested yet how much faster statsby is as compared to using an
if- or in-statement ...
Thread is closed from my side.
Thank you all very much for your help
Sven
Originial problem:
Perform 3 univariate regressions plus prediction for a large number of
simulations (50,000). Using an if-statement is not efficient (takes
hours). Solution: a wrapper program combining regress and predict in one
program to be used with statsby
* Example starts
* example dataset, only 3 simulations (variable run) included instead of
50000
clear
input run trt logrr logcox2ratio propcox1_80prozcox2
lnoddscox1_80prozcox2
1 2 2.408 .5 .89144216 2.1055574
1 3 .6371 .42857143 .88114105 2.0032802
1 4 .6164 -.64285714 .66957211 .70625041
1 5 -.0616 -.69047619 .38034865 -.48806864
1 6 1.296 -2.2 .10063391 -2.1902008
1 7 1.248 -2.7142857 .02931854 -3.4997782
1 8 1.286 -3.0238095 .01901743 -3.9431986
1 9 .09297 -3.6904762 .001 -6.9067548
2 2 2.608 .5 .89144216 2.1055574
2 3 1.295 .42857143 .88114105 2.0032802
2 4 .5836 -.64285714 .66957211 .70625041
2 5 -.3568 -.69047619 .38034865 -.48806864
2 6 1.683 -2.2 .10063391 -2.1902008
2 7 .1725 -2.7142857 .02931854 -3.4997782
2 8 1.045 -3.0238095 .01901743 -3.9431986
2 9 .2751 -3.6904762 .001 -6.9067548
3 2 2.065 .5 .89144216 2.1055574
3 3 1.07 .42857143 .88114105 2.0032802
3 4 .2483 -.64285714 .66957211 .70625041
3 5 -.2541 -.69047619 .38034865 -.48806864
3 6 1.21 -2.2 .10063391 -2.1902008
3 7 -.6535 -2.7142857 .02931854 -3.4997782
3 8 1.113 -3.0238095 .01901743 -3.9431986
3 9 .2599 -3.6904762 .001 -6.9067548
end
capture program drop regpred
program define regpred, rclass
version 10.1
syntax [if]
marksample touse
quietly count if `touse'
if `r(N)' == 0 {
error 2000
}
tempvar prediction1 prediction2 prediction3 obs
local res coeff1 pred1_2 pred1_3 pred1_4 pred1_5 pred1_6
pred1_7 pred1_8 pred1_9 coeff2 pred2_2 pred2_3 pred2_4 pred2_5 pred2_6
pred2_7 pred2_8 pred2_9 coeff3 pred3_2 pred3_3 pred3_4 pred3_5 pred3_6
pred3_7 pred3_8 pred3_9
tempname `res'
sort run trt
* next 5 lines needed to access results of prediction
gen `obs' = _n
qui sum `obs' if `touse'
local start = r(min)
local stop = r(max)
local runlength = `stop' - `start' + 1
if `runlength'!=8 {
dis "Error, run!=8" // program only works for
run==8, local res would need to be generic otherwise !
}
else {
qui regress logrr logcox2ratio if `touse'
scalar `coeff1' = _b[logcox2ratio]
qui predict `prediction1' if `touse', xb
local trt = 2
forval s=`start'/`stop' {
scalar `pred1_`trt'' =
`prediction1'[`s']
local trt = `trt' + 1
}
qui regress logrr propcox1_80prozcox2 if
`touse'
scalar `coeff2' = _b[propcox1_80prozcox2]
qui predict `prediction2' if `touse', xb
local trt = 2
forval s=`start'/`stop' {
scalar `pred2_`trt'' =
`prediction2'[`s']
local trt = `trt' + 1
}
qui regress logrr lnoddscox1_80prozcox2 if
`touse'
scalar `coeff3' = _b[lnoddscox1_80prozcox2]
qui predict `prediction3' if `touse', xb
local trt = 2
forval s=`start'/`stop' {
scalar `pred3_`trt'' =
`prediction3'[`s']
local trt = `trt' + 1
}
foreach r of local res {
return scalar `r' = ``r''
}
}
end
statsby coeff1=r(coeff1) pred1_2=r(pred1_2) pred1_3=r(pred1_3)
pred1_4=r(pred1_4) pred1_5=r(pred1_5) pred1_6=r(pred1_6)
pred1_7=r(pred1_7) pred1_8=r(pred1_8) pred1_9=r(pred1_9)
coeff2=r(coeff2) pred2_2=r(pred2_2) pred2_3=r(pred2_3)
pred2_4=r(pred2_4) pred2_5=r(pred2_5) pred2_6=r(pred2_6)
pred2_7=r(pred2_7) pred2_8=r(pred2_8) pred2_9=r(pred2_9)
coeff3=r(coeff3) pred3_2=r(pred3_2) pred3_3=r(pred3_3)
pred3_4=r(pred3_4) pred3_5=r(pred3_5) pred3_6=r(pred3_6)
pred3_7=r(pred3_7) pred3_8=r(pred3_8) pred3_9=r(pred3_9), by(run)
saving(C:\@_temp\statsby, replace): regpred
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/