Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: Efficient way to predict values from regressions on subsets of the data?
From
"David Radwin" <[email protected]>
To
<[email protected]>
Subject
st: RE: Efficient way to predict values from regressions on subsets of the data?
Date
Fri, 15 Apr 2011 16:06:20 -0700 (PDT)
Apparently -in- is faster than -if-, but perhaps only twice as fast.
See Blasnik's Law in
http://www.stata.com/statalist/archive/2007-09/msg00361.html
and http://www.stata.com/statalist/archive/2007-08/msg00668.html
So this fix probably will not solve your problem.
David
--
David Radwin
Research Associate
MPR Associates, Inc.
2150 Shattuck Ave., Suite 800
Berkeley, CA 94704
Phone: 510-849-4942
Fax: 510-849-0794
www.mprinc.com
> -----Original Message-----
> From: [email protected] [mailto:owner-
> [email protected]] On Behalf Of [email protected]
> Sent: Friday, April 15, 2011 2:35 PM
> To: [email protected]
> Subject: st: Efficient way to predict values from regressions on subsets
> of the data?
>
> Hello all,
>
> I have a project that involves assembling a panel of data in long format
> and running (quantile) regressions for each institution. My basic
problem
> involves running estimations on subsets of the data and keeping
predicted
> values from each of the regressions. I can't use -by:- unless I write
a
> wrapper, but this will be slow anyway because it uses if qualifiers (see
> below). I have implemented this in both SAS and Stata and my SAS code
is
> about 100 times faster than my best Stata implementation.
>
> The panel is unbalanced, but to give you an idea the average number of
> time periods is 650 and the number of firms is over a thousand. For
each
> firm I need to run three regressions, taking predicted values from two
and
> a coefficient from the third, and combining these three items into a new
> variable. I have been having trouble finding a way to do this
> efficiently.
>
> One way would be to loop over all firms and use if qualifiers in the
> regressions and predictions. I have found this to be very slow, using
if
> clauses on such a long dataset is very very slow, the procedure seems
to
> take around 4 to 40 seconds per firm!
>
> My code now is a bit cumbersome but faster, but involves reshaping the
> data into wide format to avoid using if qualifiers. I split the data
into
> 10 pieces by firm, then reshape each of these 10 pieces into wide
> format. I am splitting into 10 files because Stata's reshape command
is
> quite slow (25-30 minutes for me) in reshaping my panel from long to
wide,
> but splitting into 10 the reshape only takes a few seconds each. Then I
> have 2 layers of loops: one over the 10 files and then over the firms
> inside each file, running the estimation and generating new variables
for
> each of the firms results. This method is much faster, there are no if
> qualifiers because the data is in wide format. It takes about 0.5-1.2
> seconds to run each firm. Overall, including the reshaping, this
> procedure takes maybe 20-30 minutes to run.
>
> Unfortunately for Stata fans (including myself), I was able to get this
> entire thing to run in about 50 seconds in SAS, or about 0.04 seconds
per
> firm! The trick is that SAS can automatically run quantile regressions
> -by- a panel variable AND output predicted values at the same time.
But,
> I would like to keep everything in Stata if I can. Does anyone have a
> suggestion on a more efficient method of implementing what I am doing?
> Would using the -in- qualifier instead of -if- be worth it?
>
> Thanks,
>
> Daniel
> _______________________________
> Daniel Green
> Research & Statistics Group
> Federal Reserve Bank of New York
> 212-720-6320
> [email protected]
>
>
>
>
>
> This e-mail message, including attachments, is for the sole use of the
> intended recipient(s) and may contain confidential or proprietary
> information. If you are not the intended recipient, immediately contact
> the sender by reply e-mail and destroy all copies of the original
message.
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/