One more question. Suppose you have a few duplicate observations that are
errors. To take the example I used in my previous email, the data is set
up as such:
> > > Country Number year ethnicity ethnicpop
> > > 10 1930 Caucasion 1,000,000
> > > 10 1930 Hispanic 50,000
> > > 10 1931 Caucasion 1,000,100
> > > 10 1931 Hispanic 51,000
> > > 11 1931 Asia 10,000
Now suppose there are multiple observations for Caucasians for a given
year but with slightly different ethnic population totals. I wish to
systematically keep the one with the higher number and drop the other one.
However, since the observations are not technical "duplicates" given that
the ethnic population scores are different, I am having trouble using the
"duplicate" command. Does anyone have any ideas?
Thank you again for your help.
Michael
On Mon, 13 Feb 2006, Nick Cox wrote:
> No such index is needed. In fact as recommended here
> it will almost always give you an incorrect answer.
> To see why, note that after
>
> . sort country year
>
> different values of -ethnicity- will
> be sorted arbitrarily. Thus the same
> ethnicity index, as defined here, will often be
> assigned to different values of ethnicity,
> and vice versa.
>
> It is not what you ask for, quite, but a
> reshape using
>
> reshape wide ethnicpop , i(number year) j(ethnicity) string
>
> is one possibility. My guess is that it will be
> manageable than what you ask for.
>
> Nick
> [email protected]
>
> Radu Ban
>
> > see -help reshape-. you need first to generate an index at
> > country-year level
> >
> > bys country year: gen ethnic_index = _n
> >
> > reshape wide ethnicity ethnicpop, i(country year) j(ethnic_index)
> >
> > cheers,
> > -radu
> >
> > 2006/2/13, Michael Horowitz <[email protected]>:
> > > To whom it may concern:
> > >
> > > I have a dataset I was wondering if people might have a fix for.
> > >
> > > My data measures various information (ethnicity especially) of
> > > countries. The way the data is currently set up it has
> > multiple entries
> > > per country per year depending on the background of the
> > country. This
> > > means that if there are 2 ethnic groups in a country with
> > significant
> > > populations, there are 2 entries per year as follows (these
> > numbers are
> > > made up to illustrate the situation). There can also be
> > more than 2,
> > > etc., and it can change depending on the population in a given year:
> > >
> > > Country Number year ethnicity ethnicpop
> > > 10 1930 Caucasion 1,000,000
> > > 10 1930 Hispanic 50,000
> > > 10 1931 Caucasion 1,000,100
> > > 10 1931 Hispanic 51,000
> > > 11 1931 Asia 10,000
> > >
> > >
> > > I want to set up the data so there is only one entry per
> > country per year,
> > > as follows:
> > >
> > > Country Number year ethnic1 ethnic2 ethpop1 ethpop2
> > > 10 1930 Caucasian Hispanic 1,000,000 50,000
> > >
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*******************************************************************************
Michael Horowitz
83 Beacon St., Apt. 3
Somerville, MA 02143
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/