[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: data formatting question

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	RE: st: data formatting question
Date	Mon, 13 Feb 2006 22:10:39 -0000

Never say "One more question". What are you 
going to say next time? 

The -duplicates- command is designed to 
deal with duplicates. Duplicates are 
exactly equal to each other. I know this because 
we wrote it. 

What you have is quite different, and -duplicates- is
irrelevant to that. 

You want 

drop if missing(ethnicpop) 
bysort country year ethnicity (ethnicpop) : drop if _n < _N 

Note again that this will not be robust to spelling errors. 

Nick 
[email protected] 

Michael Horowitz
 
> One more question.  Suppose you have a few duplicate 
> observations that are
> errors.  To take the example I used in my previous email, the 
> data is set
> up as such:
> 
> > > > Country Number  year    ethnicity       ethnicpop
> > > > 10              1930    Caucasion       1,000,000
> > > > 10              1930    Hispanic        50,000
> > > > 10              1931    Caucasion       1,000,100
> > > > 10              1931    Hispanic        51,000
> > > > 11              1931    Asia            10,000
> 
> Now suppose there are multiple observations for Caucasians for a given
> year but with slightly different ethnic population totals.  I wish to
> systematically keep the one with the higher number and drop 
> the other one.
> However, since the observations are not technical 
> "duplicates" given that
> the ethnic population scores are different, I am having 
> trouble using the
> "duplicate" command.  Does anyone have any ideas?
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: RE: st: data formatting question
Next by Date: Re: st: data formatting question
Previous by thread: RE: st: data formatting question
Next by thread: RE: st: data formatting question
Index(es):
- Date
- Thread