Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: finding duplicate data
From
Mahbubeh Parsaeian <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: finding duplicate data
Date
Sat, 17 Nov 2012 10:08:59 -0800 (PST)
Hi everybody.
I have problem about finding duplicate data.
I work with a dataset which consist of household information. For every
household, interviewers should ask only from one of the family members. Unfortunately
in some family the interviewers asked from two or more person in a family and I
should delete this extra part of data .
To clarify the problem, imagine the id number shows the household number.
As an example the clusters consist of an id number such as 1, 2,3. I have
explored the data and understand some id numbers have been repeated in the same
cluster (for example 1 2 2) and it shows two or more person have been interviewed in the same
family. I want to use one person in a family and delete the data for the second
person.
My personal idea is to use a function to enumerate the id variable
within the clusters and delete extra id numbers which have been repeated. I know
we have some function like _n which enumerate the repetition but I don’t know
how I use this command to enumerate the repetition of id variable within
clusters.
If it is possible please help me to find a good solution.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/