Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Re: how to calculate a function (sum ) of observations without creating a new variable ?


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: Re: how to calculate a function (sum ) of observations without creating a new variable ?
Date   Sun, 17 Aug 2003 16:41:46 +0100

mikhail bontch-osmolovski

> > This is a long question about functions in Stata .
> > I would like to know what is the best way to  calculate a 
> function of
> > the data without creating a variable, i.e.  without using 
> gen or egen
> > function.
> > Situation is simple: I have a big dataset of 8 mlns of 
> observations and
> > I need to calculate weighted sums of
> > observations under certain conditions,  I need:  sum(wt) if b
> > Obvious way would be to:
> > 1.  egen c=sum(wt) if b
> > 2.  su c, or di c
> > 3. drop c.
> > However, this way is clumsy since I have to create 
> additional 8 mln of
> > observation of c, all equal to each other,
> > so it takes memory which is very limited in my case and 
> extra time (it
> > takes very long time) .
> > As I understand egen sum works by first running gen sum and than
> > replacing all observations with the last one.
> > I could not find anything online, so I wrote a simple 
> program which
> > calculates sum of a variable called wt:
> >
> > scalar a=0
> > local n=1
> > while `n'<=_N {
> >     scalar a=a+wt[`n']
> >     local n=`n'+1
> >            }
> > display a
> >
> > I could not believe my eyes, this simle program ran 3 
> times LONGER then
> > egen a=sum(wt).
> > Later I used scalar n instead of local n to save 
> interpretation time,
> > but it stilled run 3 times longer than egen.
> > So I was forced to go back to egen, but it is estetically 
> unpleasant to
> > creat 8 mln observations when you  need just a constant 
> and  often low
> > memory does even allow me to have an extra variable. I  
> am using Windows
> > XP, Stata 8.1.
> >
> > So I wonder if you know a good way to calculate a sum of 
> variables and,
> > in general, and function of variables under certain 
> conditions, like you
> > can do in Excel, without creating an extra variable?, why 
> egen is faster
> > than plain sum ?  Is this the good case for having a 
> plugin which is
> > faster ? It is hard to believe that Stata has no commands 
> for such a
> > simple operation.
> >
> > ps. display which is also called hand calcualtor does not 
> allow for if
> > condition, so it does not work

Scott Merryman
 
> -tabstat- is probably the easiest way to display a sum
> 
> . use "C:\Stata8\auto.dta", clear
> (1978 Automobile Data)
> 
> . tabstat price if mpg>20, stat(sum)
> 
>     variable |       sum
> -------------+----------
>        price |    192611
> ------------------------
> 
> Also, take a look at the saved results for -summarize-

To expand on Scott's comments, although 
I can't comment on Excel: 

Your program is going to be very slow 
for the following reason, among others: 
You are obliging Stata to interpret 
several million lines of Stata code. Just 
to stress one point: Stata doesn't have a built-in 
compiler, so your program, although much shorter
to type than -egen, sum()- would be is really much 
longer, because of the -while- loop. 

I am a big fan of -egen- where it is appropriate, 
but it has no advantages here over using 
-summarize-. 

su myvar ..., meanonly 
di r(sum) 

By the way, you say you want weighted sums, 
but none of your examples uses weights. 

I think you'll find that reasonably fast. 
Also, I doubt that you can improve on that 
very much with a plug-in, but it would be 
an interesting challenge. 

Nick 
[email protected] 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index