Radu Ban
> i'm cleaning a dataset and i encounter repeated ids. i want
> to keep them
> unique, but the problem is that for some repeated ids the
> variables differ.
> i want to keep just one of the repeated ids. so i'm using:
>
> bysort id: keep if _n == 1
>
> now i would like to know if this will keep the same id
> whenever the program
> is run. or does the ordering change?
The same -id-s will remain in the dataset. The real
issue, as you know, is what happens to values of other variables.
Suppose Stata -sort-s on -id-:
1
1
2
2
3
3
3
Suppose it did it a different way:
1
1
2
2
3
3
3
In terms of -id- alone, the answer is the same. Stata, and you, are
both
indifferent to which of these solutions (of the 2! 2! 3! = 24
possibilities)
is preferred.
Can you tell the difference? The answer is clearly no. When
you
bysort id : keep if _n == 1
the answer is, again, the same as far as you are concerned,
in terms of -id-,
id
1
2
3
Now suppose you have other variables:
1 pat
1 jean marie
2 lisa
2 teresa
3 eva marie
3 monica
3 marsha
As you know, the result after -bysort id: keep if _n == 1- could be
1 pat
2 lisa
3 eva marie
or it could be
1 jean marie
2 teresa
3 monica
and indeed any one of the other 24 possibilities.
Will the answer be the same? In general, I doubt it.
At least some of the time Stata appears to randomize
the order a little before -sort-ing, although I can't
remember why I think I know that; anyway, I doubt that the answer
is reproducible. I wouldn't depend on it.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/