Rembert De Blander <[email protected]>
If you want us to reproduce this result, you can find a handful of obs
where it can be seen, then start your example with -input- e.g.:
clear
set type double
input pid x
1 5
1 .
2 5
2 7
end
replace x=_pi if mi(x)
loc x "x"
loc id "pid"
egen double i`x' = mean(`x'), by(pid)
generate double I`x' = 0
qui levelsof `id', local(idlst)
foreach lvl of local idlst {
qui summarize `x' if (`id' == `lvl'), meanonly
qui replace I`x' = r(mean) if (`id' == `lvl')
}
su
li
On Thu, Sep 25, 2008 at 10:26 PM, Rembert De Blander
<[email protected]> wrote:
> The problem can be stated as follows:
>
> Consider the panel data setting where the command
>
> <<tsset pid time>>
>
> was issued. Under these circumstances, the command:
>
> <<by pid: egen double i`x' = mean(`x')>>
>
> should be exactly identical to:
>
> <<
> generate double I`x' = 0
> qui levelsof `id', local(idlst)
> foreach lvl of local idlst {
> qui summarize `x' if (`id' == `lvl'), meanonly
> qui replace I`x' = r(mean) if (`id' == `lvl')
> }
>>>
>
> Now, the problem, as far as I experienced it, can appear when `x' is a
> float variable. Worse, the discrepancy between both command sequences
> seems to involve a "random" component, since it differs from run to run.
> The latter sequence of commands always produces identical results, but the
> 'egen' commands output varies. Of course these fluctuations are of the
> order of machine precision. Nevertheless they are worrying, since they
> constitute 'unexpected' and certainly undocumented behaviour, which can
> lead to substantial differences, especially in iterated procedures.
>
> The problem does not occur for any `x', but I have a dataset & sequence of
> commands that produce the described behaviour.
>
> Since I am not allowed to post attachments, please mail me for more info:
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/