The problem can be stated as follows:
Consider the panel data setting where the command
<<tsset pid time>>
was issued. Under these circumstances, the command:
<<by pid: egen double i`x' = mean(`x')>>
should be exactly identical to:
<<
generate double I`x' = 0
qui levelsof `id', local(idlst)
foreach lvl of local idlst {
qui summarize `x' if (`id' == `lvl'), meanonly
qui replace I`x' = r(mean) if (`id' == `lvl')
}
>>
Now, the problem, as far as I experienced it, can appear when `x' is a
float variable. Worse, the discrepancy between both command sequences
seems to involve a "random" component, since it differs from run to run.
The latter sequence of commands always produces identical results, but the
'egen' commands output varies. Of course these fluctuations are of the
order of machine precision. Nevertheless they are worrying, since they
constitute 'unexpected' and certainly undocumented behaviour, which can
lead to substantial differences, especially in iterated procedures.
The problem does not occur for any `x', but I have a dataset & sequence of
commands that produce the described behaviour.
Since I am not allowed to post attachments, please mail me for more info:
Rembert_at_DeBlander.eu
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/