Elsewhere Fred Wolfe and Joe J. discussed how
to use Fred's -finddup- command to look
for duplicates.
An alternative is official Stata's -duplicates-
command.
A typical sequence might run
. duplicates report id year
to see whether there are duplicates w.r.t. -id year-;
. duplicates examples id year
or
. duplicates list id year
to see what they are; and
. duplicates examples id year y
to see whether there are also ties on -y- for those
duplicates; and so forth.
As in the case of -finddup-, and of other duplicates
commands I can think of, multiple variables may
be specified to -duplicates-.
Nick
[email protected]
joe J.
>
> Here is what I meant.
>
> Panel variable: id, time variable: year
>
> There is a variable y which has missing values and I want to
> use -cipolate-
> --the stata-code available at SSC--to interpolate the missing
> values. I do
> the interpolation the following way.
>
> tsset id year, yearly
> by id : cipolate y year, gen(yci)
>
> It does not run because id has some duplicates, which resulted due to
> data-entry errors. Therefore I want to remove duplicates for
> each year and
> do -cipolate- (the cubic interpolation code at ssc) on the
> resulting data
> set with unique ids.
>
> I remove duplicates the following way for each year.
>
> use "C:\data75.dta", clear
> finddup id if year==1975, nol k/*finddup is also downloadable
> from ssc*/
> save "C:\data75a.dta", replace
>
> drop if dupval>=2/*removing duplicates*/
> save "C:\data75b.dta", replace/*data with unique ids*/
>
> by id : cipolate y year, gen(yci)/*cubic interpolation*/
> save "C:\data75c.dta", replace
>
>
> use "C:\data75a.dta", clear
> keep if dupval>=2/*collecting duplicates*/
> save "C:\data75d.dta", replace
>
> I repeat the above steps for other years and at the end append the
> interpolated and duplicate files for each year.
>
> use "C:\data75c.dta", clear
> append using "C:\data75d.dta"
> append using "C:\data76c.dta"
> append using "C:\data76d.dta"
> etc.
> My question is , is there any way of detecting duplicate ids
> for all years
> simaltaneosly instead of doing it for each year sepearately.
> (I wish I could
> do it the following way
> by year: finddup id , nol k).
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/