Be very careful here. You're confusing
some quite different beasts.
-egen-
======
-egen, sum()- fires up an -egen- function which
produces totals. Under -by:- or with
a -by()- option it produces group totals.
You can find the code in -_gsum.ado- (-which
_gsum- will find where on your machine).
In essence, -egen- only takes -egen- functions,
either as documented under -[R] egen-, or as
user-defined -egen- functions _always_
flagged as such.
Also, -egen- functions are _never, ever_
allowed anywhere else. They require -egen-
absolutely.
-egen- is really rather limited. There are
perhaps of the order of 100 -egen- functions written,
and that's a fixed menu, except insofar as
if you don't like them, you can indeed
write your own.
-sum()- and other functions
===========================
-sum()- anywhere else it is legal fires up
the -sum()- function which produces
cumulative sums. This is part of the
executable and has been so for a very
long time, perhaps even since Stata 1.0.
-generate- (and -replace-) can in effect
take very complicated expressions
as arguments, making use of constants,
variables, operators and functions
such as -sum()-. The scope of -generate-
is in no way indicated by the few token
examples in the help. By combining constants,
variables, operators and functions,
you have _much_ more flexibility than with
-egen-.
Why then bother with -egen-? Just
for convenience, that some often
repeated sets of operations have been
rolled into -egen- functions.
Name conflict!
==============
If you find this confusing, or difficult
to defend, you are in
excellent company. Svend Juul gave
a very droll paper at the Berlin users'
meeting in which he underlined this
and a few other messes over names.
StataCorp are known to be taking the
issue seriously. At the same time,
the last thing they want to do is
to break any existing programs,
do files or habits.
by:
===
One source of explanations is
How to move step by: step. Stata Journal 2(1): 86-102
(2002)
which gathers the main ideas in one place. The obvious
alternative is to look up -by- in the Manual index and read
the several sections thus indicated. The article
just mentioned was written because the coverage
of -by:- in the manuals is a bit fragmented.
It's been said by a long-time Stata user
that wrapping your head around the possibilities
of -by:- is the biggest single step you can take
to real Stata fluency.
Nick
[email protected]
Daniel Egan
>
> by sort pid (ob):gen cave = sum(calc)/ob
>
> This is so obvious as to be painful. So why didn't I think of it?
>
> 1) Where/When did -sum()- become an acceptable argument to
> -generate-!?!? I have only ever seen it in the context of -egen-.
> Looking at the help for -generate-, there are no arguments that are
> explicitly stated to be useable. It is only at the very bottom of the
> examples that one sees an function -uniform- and then -sum- used with
> gen. Are the others? I know that using many egen arguments with -gen-
> will return errors (e.g. count).
>
> 2) Why does the ---bys pid (ob)-- do this correctly? I understand that
> it is equivalet to --sort PID OB--, but why does it result in the
> correct cumulative sum?
> Another way of putting this is why doesnt -egen cave=sum(calc)/ob,
> by(PID OB)- work if this does?
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/