mikhail bontch-osmolovski
> > This is a long question about functions in Stata .
> > I would like to know what is the best way to calculate a
> function of
> > the data without creating a variable, i.e. without using
> gen or egen
> > function.
> > Situation is simple: I have a big dataset of 8 mlns of
> observations and
> > I need to calculate weighted sums of
> > observations under certain conditions, I need: sum(wt) if b
> > Obvious way would be to:
> > 1. egen c=sum(wt) if b
> > 2. su c, or di c
> > 3. drop c.
> > However, this way is clumsy since I have to create
> additional 8 mln of
> > observation of c, all equal to each other,
> > so it takes memory which is very limited in my case and
> extra time (it
> > takes very long time) .
> > As I understand egen sum works by first running gen sum and than
> > replacing all observations with the last one.
> > I could not find anything online, so I wrote a simple
> program which
> > calculates sum of a variable called wt:
> >
> > scalar a=0
> > local n=1
> > while `n'<=_N {
> > scalar a=a+wt[`n']
> > local n=`n'+1
> > }
> > display a
> >
> > I could not believe my eyes, this simle program ran 3
> times LONGER then
> > egen a=sum(wt).
> > Later I used scalar n instead of local n to save
> interpretation time,
> > but it stilled run 3 times longer than egen.
> > So I was forced to go back to egen, but it is estetically
> unpleasant to
> > creat 8 mln observations when you need just a constant
> and often low
> > memory does even allow me to have an extra variable. I
> am using Windows
> > XP, Stata 8.1.
> >
> > So I wonder if you know a good way to calculate a sum of
> variables and,
> > in general, and function of variables under certain
> conditions, like you
> > can do in Excel, without creating an extra variable?, why
> egen is faster
> > than plain sum ? Is this the good case for having a
> plugin which is
> > faster ? It is hard to believe that Stata has no commands
> for such a
> > simple operation.
> >
> > ps. display which is also called hand calcualtor does not
> allow for if
> > condition, so it does not work
Scott Merryman
> -tabstat- is probably the easiest way to display a sum
>
> . use "C:\Stata8\auto.dta", clear
> (1978 Automobile Data)
>
> . tabstat price if mpg>20, stat(sum)
>
> variable | sum
> -------------+----------
> price | 192611
> ------------------------
>
> Also, take a look at the saved results for -summarize-
To expand on Scott's comments, although
I can't comment on Excel:
Your program is going to be very slow
for the following reason, among others:
You are obliging Stata to interpret
several million lines of Stata code. Just
to stress one point: Stata doesn't have a built-in
compiler, so your program, although much shorter
to type than -egen, sum()- would be is really much
longer, because of the -while- loop.
I am a big fan of -egen- where it is appropriate,
but it has no advantages here over using
-summarize-.
su myvar ..., meanonly
di r(sum)
By the way, you say you want weighted sums,
but none of your examples uses weights.
I think you'll find that reasonably fast.
Also, I doubt that you can improve on that
very much with a plug-in, but it would be
an interesting challenge.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/