Kripa Freitas
> I'm working with the SIPP data. I have multiple
> observations per person
> per wave which are all identical. So to eliminate this I create the
> variable x in the following way:
>
> . sort ssuid epppnum eentaid wave
> . qui by ssuid epppnum eentaid wave: gen x=_N
> .drop if x>1
>
> what i'm left with is:
> . sort ssuid epppnum eentaid wave
>
> . tab x
>
> x | Freq. Percent Cum.
> ------------+-----------------------------------
> 0 | 45 0.01 0.01
> 1 | 327,053 99.99 100.00
> ------------+-----------------------------------
> Total | 327,098 100.00
>
> To recheck I create variable y in the exact same way:
> . sort ssuid epppnum eentaid wave
>
> . by ssuid epppnum eentaid wave: gen y=_N
>
> . tab y
>
> y | Freq. Percent Cum.
> ------------+-----------------------------------
> 1 | 327,047 99.98 99.98
> 51 | 51 0.02 100.00
> ------------+-----------------------------------
> Total | 327,098 100.00
>
> Would someone be able to help me with a reason why it gives
> me different results?
First note that your first block of code
. sort ssuid epppnum eentaid wave
. qui by ssuid epppnum eentaid wave: gen x=_N
. drop if x>1
could be telescoped to
. bysort ssuid epppnum eentaid wave : gen x = _N
. drop if x > 1
However, I don't understand how -x- can ever be
created as 0. _N, as I understand it, can only
be a positive integer. Are you sure you did nothing
else to -x-?
Your second block of code is credible.
That said, it seems that your code will
not do what you really want, as it will drop
_all_ repeated copies. I guess that you would
prefer to keep one from each set of repeated copies.
For that a first principles solution could be
. bysort ssuid epppnum eentaid wave : keep if _n == 1
or
. bysort ssuid epppnum eentaid wave : drop if _n > 1
In addition, there are various programs dedicated
to this problem. Official Stata (from version 8)
now has a -duplicates- command.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/