This thread is extraordinarily frustrating.
I still am not clear on what is desired and
on what is seen to be a problem.
Nikolaos stated at one point that he wanted
to eliminate duplicates. If this means -drop-
them from the data, then -duplicates drop-
is available in Stata, although writing your own code
would be instructive.
But it seems to mean "make them different", but
adding different small constants and then adding noise
have both been seized upon as solutions. Are
they equally attractive or appropriate?
At the risk of complicating an already convoluted
thread, I add further comments:
0. If `E' and `SE' are some kind of identifier, then some
coding as unique integers is likely to be optimal (and comments
below are irrelevant).
1. Changing the data needs to be justified.
2. Adding different constants and adding random
noise are not reproducible without further
constraints. The first depends on sort order
and the second on seed and time.
3. Adding even small amounts that are all positive
changes any location parameter for any variable.
I can't encourage any of the solutions offered
without knowing that there is an answer to 1 and
that 2 and 3 don't (won't) matter. But if 2 and 3
don't matter, why do all this in the first place?
Whatever the precise problem, I am confident,
with Austin Nichols, that _no_ looping should be required.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/