Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Efficient way to predict values from regressions on subsets of the data?
From
[email protected]
To
[email protected]
Subject
st: Efficient way to predict values from regressions on subsets of the data?
Date
Fri, 15 Apr 2011 17:35:23 -0400
Hello all,
I have a project that involves assembling a panel of data in long format
and running (quantile) regressions for each institution. My basic problem
involves running estimations on subsets of the data and keeping predicted
values from each of the regressions. I can't use -by:- unless I write a
wrapper, but this will be slow anyway because it uses if qualifiers (see
below). I have implemented this in both SAS and Stata and my SAS code is
about 100 times faster than my best Stata implementation.
The panel is unbalanced, but to give you an idea the average number of
time periods is 650 and the number of firms is over a thousand. For each
firm I need to run three regressions, taking predicted values from two and
a coefficient from the third, and combining these three items into a new
variable. I have been having trouble finding a way to do this
efficiently.
One way would be to loop over all firms and use if qualifiers in the
regressions and predictions. I have found this to be very slow, using if
clauses on such a long dataset is very very slow, the procedure seems to
take around 4 to 40 seconds per firm!
My code now is a bit cumbersome but faster, but involves reshaping the
data into wide format to avoid using if qualifiers. I split the data into
10 pieces by firm, then reshape each of these 10 pieces into wide
format. I am splitting into 10 files because Stata's reshape command is
quite slow (25-30 minutes for me) in reshaping my panel from long to wide,
but splitting into 10 the reshape only takes a few seconds each. Then I
have 2 layers of loops: one over the 10 files and then over the firms
inside each file, running the estimation and generating new variables for
each of the firms results. This method is much faster, there are no if
qualifiers because the data is in wide format. It takes about 0.5-1.2
seconds to run each firm. Overall, including the reshaping, this
procedure takes maybe 20-30 minutes to run.
Unfortunately for Stata fans (including myself), I was able to get this
entire thing to run in about 50 seconds in SAS, or about 0.04 seconds per
firm! The trick is that SAS can automatically run quantile regressions
-by- a panel variable AND output predicted values at the same time. But,
I would like to keep everything in Stata if I can. Does anyone have a
suggestion on a more efficient method of implementing what I am doing?
Would using the -in- qualifier instead of -if- be worth it?
Thanks,
Daniel
_______________________________
Daniel Green
Research & Statistics Group
Federal Reserve Bank of New York
212-720-6320
[email protected]
This e-mail message, including attachments, is for the sole use of the intended recipient(s) and may contain confidential or proprietary information. If you are not the intended recipient, immediately contact the sender by reply e-mail and destroy all copies of the original message.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/