Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Is -collapse- the Stata's fastest routine to summarize data sets?
From
Eric Booth <[email protected]>
To
"<[email protected]>" <[email protected]>
Subject
Re: st: Is -collapse- the Stata's fastest routine to summarize data sets?
Date
Fri, 9 Jul 2010 14:26:15 +0000
<>
If you want to collapse by several categorical vars with -tabout- it's not as straightforward as with -collapse-. You can create a single variable that is an indicator of all possible combinations of the n categorical variables and then -tabout- by that combined indicator. For example,
******************!
clear
sysuse auto
cap which tabout
if _rc ssc install tabout
**create n categorical vars**
recode rep78 (.=0)
lab def rep78 1 "one" 2 "two" 3 "three" 4 "four" 5 "five" 0 "zero/miss", modify
lab val rep78 rep78
egen price2 = cut(price), group(4) label
drop price
// 1. collapse
ds make rep78 for price2, not
local vars `r(varlist)'
**
preserve
collapse (sum) `vars' , by(rep78 price2 foreign)
outsheet using collapsed.csv, comma replace
restore
// 2. tabout
local vars: subinstr local vars " " " sum ", all
di "`vars'"
**
tabout rep78 price2 foreign using taboutex.csv, replace sum c(sum `vars') style(csv) h2(THIS ISN'T WHAT YOU WANT |)
preserve
**decode your categorical vars**
foreach v in rep78 price2 foreign {
decode `v', g(`v'a)
drop `v'
rename `v'a `v'
}
**combine your categorical vars into one var**
g categories = price2 + rep78 + " - " + foreign
ta categories
**
tabout categories using taboutex.csv, append sum c(sum `vars') h2(THIS IS WHAT YOU WANT|) lines(double) style(csv)
restore
******************!
~ Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
Office: +979.845.6754
On Jul 8, 2010, at 6:47 PM, Tiago V. Pereira wrote:
> Many, many thanks Eric!
>
> Yes, -tabout- really seems to be much faster than -collapse-. However, I
> could not figure out how to make it work when one has n categorical
> variables, and wants to summarize continous variables taking all possible
> combinations of the n categorical variables.
>
> -collapse- does that using the by() option.
>
> Thanks again!
>
> Tiago
>
>
>
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/