Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Joe Canner <jcanner1@jhmi.edu> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | st: RE: Finding duplicate values across different variables |
Date | Mon, 10 Mar 2014 15:24:05 +0000 |
Michael, Nick Cox answered a very similar question here last week: http://www.stata.com/statalist/archive/2014-03/msg00067.html Let us know if you can't get his solution to work or if it doesn't apply. Regards, Joe Canner -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Michael Goodwin Sent: Monday, March 10, 2014 11:04 AM To: statalist@hsphsun2.harvard.edu Subject: st: Finding duplicate values across different variables I have a social network dataset consisting of two ID variables (source and target) and a number of indicators (ind1, ind2, ind3). The data looks like this: source target ind1 ind2 ind3 company1 company2 1 0 0 company3 company5 0 1 0 company2 company1 1 1 0 company5 company3 1 1 1 My goal is to 1) consolidate any observations where the combination of source and target is equal (even where they aren't duplicates in the traditional Stata sense, such as observations 1 and 3 or 2 and 4 above); and 2) make the source and target of the consolidated observation equal to the source and target of whichever observation had a higher rowtotal of the indicators (so observations 3 and 4 would remain). Thus far, my approach has been to create a concatenation of source and target and, in a loop, flag all instances where source+target==target+source elsewhere in the dataset. gen orig = source+target; gen new = target+source; gen temp = .; local max = _N; egen count = rowtotal(ind*); forv num = 1/`max' {; replace temp = 1 if orig==new[`num']; }; I still haven't been able to figure out how to sort the resulting dataset in such a way that I can easily consolidate the observations based on the count variable. Any thoughts you have would be much appreciated. Thanks in advance. Best, Mike * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/