Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: faster collapsing 2 or 3 variables by bins
From
László Sándor <[email protected]>
To
[email protected]
Subject
st: faster collapsing 2 or 3 variables by bins
Date
Fri, 7 Sep 2012 11:46:10 -0400
Hi again,
Let me also ask another question. This again meant for Stata 11 and
above, on all platforms.
I would like to collapse 2 or 3 variables at the same time (and then
plot the collapsed values as efficiently as possible, but that's worth
a separate thread). However, as I have tens of millions of
observations, simple -collapse- does not suffice, definitely not with
preserving and restoring my data. I experimented with collapsing in
Mata, but even that is slower than using the optimized C-code of
-tabulate- and then parsing the log. (This does not lose the data
either.) But I'd like to go even faster than that. Namely, I have some
hope that some of you had a suggestion what to do instead of separate
lines for each variable along this one:
noisily tab `x_q' if `touse' `wt', sum(`y_r') means wrap nolabel noobs
Basically, as I am collapsing by `x_q' (bins), the variable `y_r' is
changing from line to line. But with separate if conditions to check
and the collapsing done again, this is not as efficient as "allowing
for a varlist" in the sum option. In the generic case, missing value
patterns might be an issue, but here I already made sure that I only
have observations with all variable to collapse non-missing.
Please let me know if you can think of any trick to do this. I know,
you cannot magically let me hack the built-in -tabulate, summarize-.
But still…
(Related issues did come up in earlier threads on this list, but not
the multiple tabs.)
Thanks,
Laszlo
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/