Steven Stillman (LMPG) wrote:
> I am collapsing individual/quarter data down to yearly population counts
> for a number of variables (around 20) for various groups (this part isn't
> important). This is a large dataset of about 20,000 obs per quarter * 64
> quarters. My full dataset is around 330m.
>
> Ideally, I would do this using the command:
> collapse (sum) varlist [pw=weight], by(group year) fast
>
> Unfortunately, even when I drop all variables from my dataset besides the
> ones being collapse and allocate my full system memory of 500m, I get an
> error message that not enough memory is available. I believe this occurs
> because of collapse's internal use of doubles and its creation of new temp
> variables before deleting the old ones.
>
> I have gotten around this using the following sequence of commands:
>
> [for var varlist: (forgive my use of for, old habits die hard)
>
> egen float temp = sum(X*weight), by(group year) \
> qui replace X = temp \
> qui drop temp] (brackets are just to indicate this is all one command)
>
> bys year group: keep if _n==1
>
> This does exactly what I need but is tediously slow. My use of egen means
> I am storing lots of unnecessary information (ie duplicate records) that I
> have no need for
This does the same, but faster. I would be curious how large the difference in
speed is.
sort year group
foreach var of varlist myvars* {
by year group: replace `var' = sum(`var'*weight)
}
by year group: keep if _n==_N
regards
uli
--
[email protected]
+49 (030) 25491-361
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/