Never say "One more question". What are you
going to say next time?
The -duplicates- command is designed to
deal with duplicates. Duplicates are
exactly equal to each other. I know this because
we wrote it.
What you have is quite different, and -duplicates- is
irrelevant to that.
You want
drop if missing(ethnicpop)
bysort country year ethnicity (ethnicpop) : drop if _n < _N
Note again that this will not be robust to spelling errors.
Nick
[email protected]
Michael Horowitz
> One more question. Suppose you have a few duplicate
> observations that are
> errors. To take the example I used in my previous email, the
> data is set
> up as such:
>
> > > > Country Number year ethnicity ethnicpop
> > > > 10 1930 Caucasion 1,000,000
> > > > 10 1930 Hispanic 50,000
> > > > 10 1931 Caucasion 1,000,100
> > > > 10 1931 Hispanic 51,000
> > > > 11 1931 Asia 10,000
>
> Now suppose there are multiple observations for Caucasians for a given
> year but with slightly different ethnic population totals. I wish to
> systematically keep the one with the higher number and drop
> the other one.
> However, since the observations are not technical
> "duplicates" given that
> the ethnic population scores are different, I am having
> trouble using the
> "duplicate" command. Does anyone have any ideas?
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/