Perhaps more information about the discrpency you noticed would be helpful?
-egen- and -collapse- both use -generate-'s sum function.
The only difference I see is that -collapse- generates the new variable as
datatype double where -egen- let's you choose, though the default is float.
If you choose a datatype that cannot handle the resulting sum, then you
would end up with missing values and thus a different result than you would
with -collapse-. If your sums are larger than floats maximum 1.70141173319*10^36
then you would get varying results. Try your program again choosing datatype
double with -egen- to see if this fixes the problem.
To view the ado files for these commands you can download -adoedit- from
the SSC or:
. findfile collapse.ado
. view "`r(fn)'"
. findfile egen.ado
. view "`r(fn)'"
or more specifically:
. findfile _gsum.ado
. view "`r(fn)'"
Dan Blanchette
Applications Analyst Programmer
Carolina Population Center UNC-CH
> While aggregating a dataset using collapse some strange results were
> obtained:
>
> collapse (sum) aantal, by(opl114)
>
> Did not give the same results as the same dataset gave in other programs.
>
> Checking with
>
> egen oplantal=sum(aantal),by(opl114)
>
> though gave exactly the same (correct) number that other programs gave me.
> Can somebody explain to me how the summation (could) differ between collapse
> and egen?
>
> Thanks.
> --------------------------------------------
>
> Ben Kriechel
>
> Research Centre for Education
> and the Labour Market
> <[email protected]>
>
>
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/