Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: Replacing duplicate values
From
"Pavlos C. Symeou" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: RE: Replacing duplicate values
Date
Thu, 01 Apr 2010 17:20:35 +0200
Dear Nick and Abdel,
thank you for your replies. I need to clarify that I don't wish to drop
any duplicate observations. Rather, I want to delete duplicate values
across the four ipc variables and then move all the distinct values to
the left. Transforming them into the long format would be one option but
the complete dataset is too complex and I prefer to avoid this at the time.
Regards,
Pavlos
"AbdelRahmen Wrote"
"type help duplicates drop under Stata and you will find what you are looking for"
On 01/04/2010 17:00, Nick Cox wrote:
It's a Stata two-step: reshape, drop duplicates, reshape back. Something like
* warning: untested code
reshape long ipc_, i(id)
bysort id ipc_: gen superfluousandredundant = _n> 1
drop if superfluousandredundant
bysort id (ipc) : gen j = _n
reshape wide ipc, i(id) j(j)
Actually, the last -reshape- might not be a good idea. The long structure might be more useful.
Nick
[email protected]
Pavlos C. Symeou
I have a dataset which concerns patents. Every patent is assigned a
number of International Patent Classifications (IPCs). However, there
are mistakes in the database and certain IPCs appear more than once for
a single patent, which is meaningless. Examples are patents with id 6
and id 7 (ipc_1, ipc_2 etc list the number of IPCs a single patent is
assigned). For the patent with id 6 we can see that ipc_2 and ipc_3 are
the same. Id 7 illustrates a more general issue. Duplicate values may
not appear sequentially and may appear more than twice.
id ipc_1 ipc_2 ipc_3 ipc_4
1 A44B G09F H04N
2 A47B G06F H05K E05D
3 A47B G06F
4 A47B H04N H05K
5 A47B
6 A47B F16M F16M H05K
7 A47B A47B F16M A47B
Can you suggest a way to delete the duplicate values, which can be more
than two, and move the remaining to the left? For example patents with
id 6 and id 7 would look like this:
id ipc_1 ipc_2 ipc_3 ipc_4
6 A47B F16M H05K
7 A47B F16M
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/