The closet thing I know of is the "by" command.
If you include variables with repeating information
in the "by" then you are assure that they are duplicates.
. by id a b: keep if _n==1
will keep the first observation when there are multiple
observations of the same id, a and b combinations.
Data:
id a b c
100 1 3 4
100 1 3 2
100 1 3 1
102 3 4 8
102 3 4 9
102 3 4 1
102 3 4 2
. by id a b: keep if _n==1
. list
returns:
id a b c
100 1 3 4
102 3 4 8
Here's a webpage for a longer explanation:
http://www.cpc.unc.edu/services/computer/presentations/statatutorial/example21.html
dan
carolina population center, unc-ch
[email protected]
> Dear Statalisters,
>
> I am looking for a Stata analogue of a SAS procedure for a certain type of
> duplicate removal. Suppose a dataset has fields A-J. For all subsets of
> records for which fields A-C are identical, I wish to keep only the first
> record and discard the rest, keeping all fields of the retained records.
> What is the simplest way to do this with Stata commands?
>
> Thanks very much in advance.
>
> Howard Burkom
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/