This is a summary of a thread initiated under the heading :
"cumulative average moving through time ". I have changed the heading
in the hope that future generations may learn faster than I did.
The discussion moved to my ignorance of how -egen-, -generate- and
-by- work, with multiple voices explaining exactly why
by sort pid (ob):gen cave = sum(calc)/ob
is not the same as
egen cave=sum(calc)/ob, by(pid ob)
Thanks to Nick Cox, Michael Blasnick, Scott Merryman, and David Kantor
for their explanations.
***************************************************************
Nick Cox (as usual) wrote the bible on it:
<quote>
Be very careful here. You're confusing
some quite different beasts.
-egen-
======
-egen, sum()- fires up an -egen- function which
produces totals. Under -by:- or with
a -by()- option it produces group totals.
You can find the code in -_gsum.ado- (-which
_gsum- will find where on your machine).
In essence, -egen- only takes -egen- functions,
either as documented under -[R] egen-, or as
user-defined -egen- functions _always_
flagged as such.
Also, -egen- functions are _never, ever_
allowed anywhere else. They require -egen-
absolutely.
-egen- is really rather limited. There are
perhaps of the order of 100 -egen- functions written,
and that's a fixed menu, except insofar as
if you don't like them, you can indeed
write your own.
-sum()- and other functions
===========================
-sum()- anywhere else it is legal fires up
the -sum()- function which produces
cumulative sums. This is part of the
executable and has been so for a very
long time, perhaps even since Stata 1.0.
-generate- (and -replace-) can in effect
take very complicated expressions
as arguments, making use of constants,
variables, operators and functions
such as -sum()-. The scope of -generate-
is in no way indicated by the few token
examples in the help. By combining constants,
variables, operators and functions,
you have _much_ more flexibility than with
-egen-.
Why then bother with -egen-? Just
for convenience, that some often
repeated sets of operations have been
rolled into -egen- functions.
by:
===
How to move step by: step. Stata Journal 2(1): 86-102
(2002)
which gathers the main ideas in one place. The obvious
alternative is to look up -by- in the Manual index and read
the several sections thus indicated. The article
just mentioned was written because the coverage
of -by:- in the manuals is a bit fragmented.
<end quote>
*****************************************************************
On this note, Scott Merryman said:
<quote>
bysort pid (ob)- sorts pid and then ob within pid but it performs the
-gen cave = sum(calc)/ob- only on pid.
-bysort pid ob- would not work because
it would perform the calculation on each pid and ob pair.
I don't believe the –by- option in -egen- is flexible enough to interpret
-egen cave=sum(calc)/ob, by(pid ob)- correctly. Also, -egen ,sum()- does not
allow expressions as sum(calc)/ob.
You might find Nick Cox's article "Speaking Stata: How to move step by: step"
SJ 2(1) helpful.
<end quote>
**********************************************
Dave Kantor noted:
<quote>
See -help mathfun- for details.
<end quote>
Dan
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/