Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | philippe van kerm <philippe.vankerm@ceps.lu> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: Speed of bsample and nested loops |
Date | Fri, 7 Oct 2011 15:07:16 +0000 |
I would suggest -post- instead of -file- for that sort of work. Not sure you would observe significant spped improvements, however. Philippe > -----Message d'origine----- > De : owner-statalist@hsphsun2.harvard.edu [mailto:owner- > statalist@hsphsun2.harvard.edu] De la part de Richard Herron > Envoyé : Friday, October 07, 2011 4:26 PM > À : statalist@hsphsun2.harvard.edu > Objet : Re: st: Speed of bsample and nested loops > > I don't know the inner workings of -file write-, but would you have > any gain from replacing three calls with one? > > file write boot "`idb`b''" _tab > "`gb`b''" _tab > file write boot "`j'" _tab "`x'" > _tab > file write boot "`mu'" _n > > becomes > > file write boot "`idb`b''" _tab > "`gb`b''" _tab /// > "`j'" _tab > "`x'" _tab /// > "`mu'" _n > > It isn't clear to me from the help file if -file -open- leaves the > text connection open, or it just performs from checks and assigns a > handle. > > On Wed, Oct 5, 2011 at 15:51, Poliquin, Christopher <cpoliquin@hbs.edu> > wrote: > > Hi, > > > > I am trying to speed up my code for bootstrapping and suspect there > are significant gains to be made because right now it is super slow. > > > > I am trying to draw samples of size 1-3 with replacement from a file > with about 300,000 rows. It is a panel dataset of companies and their > daily stock returns for two years. > > > > I have written a little program to loop over groups of companies and > draw samples of size 1-3 from 5 different variables with returns data. > The mean of the sample is then written to a file. > > > > Could someone please look at this code and suggest areas that could > be modified to make this run at a reasonable speed? I have omitted the > beginning because the real issue is probably the nested loops. > > > > program bootstrapping > > // Bootstapping mean abnormal returns > > // Pass sample name as first argument for saving output > > // Pass replication number as second argument > > > > egen boot_grp = group(id cl) > > *[Some omitted stuff that is fast already] > > > > // Open a file to hold the bootstrapped results. > > file open boot using `1'_boots.txt, write text replace > > file write boot "id" _tab "cl" _tab "sampsize" _tab "ar" _tab > "mean" _n > > forvalues k=1/`2' { > > * This is the number of draws to make for each sample > size > > set seed `k' > > forvalues j=1/3 { > > *Draws of size 1-3 > > capture drop w > > quietly gen w = . > > // Sample with replacement, fweight in w > > bsample `j', strata(id cl permno) weight(w) > > foreach b of local boots { > > // Mean abnormal return for the sample > > // within id and cl grouping. > > forvalues x = 1/5 { > > // Within each abnormal return > measure... > > quietly summarize ar`x' if > boot_grp == `b' [fweight=w] > > loc mu = r(mean) * 100 > > // Write bootstapped means to > the output file > > file write boot "`idb`b''" > _tab "`gb`b''" _tab > > file write boot "`j'" _tab > "`x'" _tab > > file write boot "`mu'" _n > > } > > } > > } > > } > > file close boot > > end > > > > > > Best wishes, > > Chris > > > > > > > > * > > * For searches and help try: > > * http://www.stata.com/help.cgi?search > > * http://www.stata.com/support/statalist/faq > > * http://www.ats.ucla.edu/stat/stata/ > > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/