[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: reshape and duplicates

From	"Joseph Coveney" <[email protected]>
To	"Statalist" <[email protected]>
Subject	Re: st: reshape and duplicates
Date	Mon, 31 Mar 2008 01:24:56 +0900

Jennifer Nicoll Victor wrote:

Thank you Nick, for recommending the reshape command to me last week.  I now
have converted my UCINET relational dataset into dyads in Stata.  However, I
now have the problem of duplicate observations.  My data are non-directional
so the pair A-B is the same as the pair B-A.  I need to efficiently delete
the duplicates.  I need only the unique observations, where the unit of
analysis is a pair.  Can someone help?

Essentially, I have...
ID1  ID2        name1   name2 ...
1       2       Smith, John     Jones, Abby
1       3       Smith, John     White, Rich
1       4       Smith, John     Black, Kelly
2       1       Jones, Abby     Smith, John
2       3       Jones, Abby     White, Rich
2       4       Jones, Abby     Black, Kelly
3       1       White, Rich     Smith, John
3       2       White, Rich     Jones, Abby
3       4       White, Rich     Black, Kelly
4       1       Black, Kelly    Smith, John
4       2       Black, Kelly    Jones, Abby
4       3       Black, Kelly    White, Rich

And I need to have....
ID1  ID2        name1   name2 ...
1       2       Smith, John     Jones, Abby
1       3       Smith, John     White, Rich
1       4       Smith, John     Black, Kelly
2       3       Jones, Abby     White, Rich
2       4       Jones, Abby     Black, Kelly
3       4       White, Rich     Black, Kelly

But I have 191,406 pairs.

--------------------------------------------------------------------------------

The do-file below gets what you want.  Sorting 200 000 observations took
1.01 seconds on my laptop, so if the approach below takes a few moments on
your dataset, then it's probably to do with the -min()- and -max()-.  You
also might be able to avoid the situation by doing something pre-emptively
upstream.

Joseph Coveney

clear *
set more off
input byte ID1 byte ID2 str10 name1 str1 comma1 str10 name2 str10 name3 str1
comma2 str10 name4
1       2       Smith, John     Jones, Abby
1       3       Smith, John     White, Rich
1       4       Smith, John     Black, Kelly
2       1       Jones, Abby     Smith, John
2       3       Jones, Abby     White, Rich
2       4       Jones, Abby     Black, Kelly
3       1       White, Rich     Smith, John
3       2       White, Rich     Jones, Abby
3       4       White, Rich     Black, Kelly
4       1       Black, Kelly    Smith, John
4       2       Black, Kelly    Jones, Abby
4       3       Black, Kelly    White, Rich
end
replace name1 = name1 + ", " + name2
replace name2 = name3 + ", " + name4
keep ID* name1 name2
format name* %-`=max(length(name1), length(name2))'s
*
* Begin here
*
generate str dyad_id = string(min(ID1, ID2)) + "-" + string(max(ID1, ID2))
bysort dyad_id: keep if _n == 1
list, noobs separator(0)
exit


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: reshape and duplicates
Next by Date: st: Hotdeck problem
Previous by thread: st: reshape and duplicates
Next by thread: st: Hotdeck problem
Index(es):
- Date
- Thread