Hi ,
I'm working with the SIPP data. I have multiple observations per person
per wave which are all identical. So to eliminate this I create the
variable x in the following way:
. sort ssuid epppnum eentaid wave
. qui by ssuid epppnum eentaid wave: gen x=_N
.drop if x>1
what i'm left with is:
. sort ssuid epppnum eentaid wave
. tab x
x | Freq. Percent Cum.
------------+-----------------------------------
0 | 45 0.01 0.01
1 | 327,053 99.99 100.00
------------+-----------------------------------
Total | 327,098 100.00
To recheck I create variable y in the exact same way:
. sort ssuid epppnum eentaid wave
. by ssuid epppnum eentaid wave: gen y=_N
. tab y
y | Freq. Percent Cum.
------------+-----------------------------------
1 | 327,047 99.98 99.98
51 | 51 0.02 100.00
------------+-----------------------------------
Total | 327,098 100.00
Would someone be able to help me with a reason why it gives me different
results?
(I don't know if anyone else responded.)