Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: reshape
From
Daniel Feenberg <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: reshape
Date
Tue, 7 Aug 2012 15:19:59 -0400 (EDT)
On Tue, 7 Aug 2012, Airey, David C wrote:
.
When reshaping datasets from wide to long with very many variables and
rows, is there any gain in speed of reshaping fewer variables or rows
and then later combining versus letting reshape do its thing on the
whole data set?
I don't know about that, but...
The reshape command is inexplicably slow. Take a dataset with variables
id, year and x2001-x2010. Then the command:
reshape long x, i(id) j(year)
takes about 20 seconds per million observations. But you can write out a
separate file for each year of data, and then concatenate them into one
long dataset in about 2 seconds. For example:
forvalues year = 2001/2010 {
use id year x`year' using widedata,replace
rename x`year' x
save "/tmp/reshape`year'",replace
}
clear
forvalues year = 2001/2010 {
append using "/tmp/reshape`year'"
}
Obviously, the additional code isn't worthwhile unless you have
multi-millions of observations, or are reshaping many times, but
sometimes that is what you have.
dan feenberg
NBER
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/