Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Splitting a dataset efficiently/run regression repeatedly in subsets
From
"Trelle Sven" <[email protected]>
To
<[email protected]>
Subject
RE: st: Splitting a dataset efficiently/run regression repeatedly in subsets
Date
Mon, 15 Nov 2010 17:28:27 +0100
> From: Sergiy Radyakin
> Sent: Monday, November 15, 2010 4:54 PM
> 50000 regressions on 8-observations dataset of two variables
> should take about 30 seconds (see below).
See below
> So don't generate the large dataset, but rather run the
> regressions right away when you generate your simulated data.
> You don't need to save the 50000x8 observations you
> generated, as [presumably] you are also doing it with Stata,
> so next time you simulate them with your do-file - they will
> be the same (don't forget to set the rnd seed)
No, the simulations were not done in Stata
> On the other hand, since you need only one coefficient from
> this trivial regression, you may ask yourself if the
> -regress- artillery is really necessary here, or a trivial
> formula, such as the one here:
> http://en.wikipedia.org/wiki/Regression_analysis
> would suffice (and be faster).
Thanks, I will give it a try although I am not sure whether the
regression is actually the problem (see response below)
> In any case, don't forget to specify -quietly-. I am almost
> sure you don't have any intention to review the output of the
> 50,000 regressions, and that speeds up the program a lot.
Yes, I do it quietly in my do-file but skipped it for the example codes.
> . do "R:\TEMP\STD04000000.tmp"
> . set rmsg on
> r; t=0.00 10:42:16
> . sysuse auto, clear
> (1978 Automobile Data)
> r; t=0.00 10:42:16
> . keep in 1/8
> (66 observations deleted)
> r; t=0.00 10:42:16
> .
> . forvalues i=1/50000 {
> 2. qui regress price weight
> 3. }
> r; t=26.53 10:42:42
> .
> end of do-file
> r; t=26.53 10:42:42
I have a large dataset (400,000 obs and not 8) and need to analyse a
subset and that's probably the issue (not the regression itself or the
loop).
BW/Sven
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/