Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: RE: Finding duplicate values across different variables
From
Joe Canner <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: RE: Finding duplicate values across different variables
Date
Mon, 10 Mar 2014 23:15:17 +0000
Credit to Nick for the solution; I just have a good memory. It would have taken me a while to up with that (if at all).
And, yes, it is hard sometimes to formulate a help search in such as way to match your terminology with that used in previous solutions.
________________________________________
From: [email protected] [[email protected]] on behalf of Michael Goodwin [[email protected]]
Sent: Monday, March 10, 2014 6:41 PM
To: [email protected]
Subject: Re: st: RE: Finding duplicate values across different variables
Hi Joe,
Thanks, this is extremely helpful. Sometimes you just have to know how
to ask the right question!
Best,
Mike
On Mon, Mar 10, 2014 at 11:24 AM, Joe Canner <[email protected]> wrote:
> Michael,
>
> Nick Cox answered a very similar question here last week: http://www.stata.com/statalist/archive/2014-03/msg00067.html
>
> Let us know if you can't get his solution to work or if it doesn't apply.
>
> Regards,
> Joe Canner
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Michael Goodwin
> Sent: Monday, March 10, 2014 11:04 AM
> To: [email protected]
> Subject: st: Finding duplicate values across different variables
>
> I have a social network dataset consisting of two ID variables (source and
> target) and a number of indicators (ind1, ind2, ind3). The data looks like
> this:
>
> source target ind1 ind2 ind3
> company1 company2 1 0 0
> company3 company5 0 1 0
> company2 company1 1 1 0
> company5 company3 1 1 1
>
> My goal is to 1) consolidate any observations where the combination of
> source and target is equal (even where they aren't duplicates in the
> traditional Stata sense, such as observations 1 and 3 or 2 and 4 above);
> and 2) make the source and target of the consolidated observation equal to
> the source and target of whichever observation had a higher rowtotal of the
> indicators (so observations 3 and 4 would remain).
>
> Thus far, my approach has been to create a concatenation of source and
> target and, in a loop, flag all instances where
> source+target==target+source elsewhere in the dataset.
>
> gen orig = source+target;
> gen new = target+source;
> gen temp = .;
> local max = _N;
> egen count = rowtotal(ind*);
> forv num = 1/`max' {;
> replace temp = 1 if orig==new[`num'];
> };
>
> I still haven't been able to figure out how to sort the resulting dataset
> in such a way that I can easily consolidate the observations based on the
> count variable. Any thoughts you have would be much appreciated. Thanks in
> advance.
>
> Best,
>
> Mike
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
--
MIKE GOODWIN
Project Leader, Endeavor Insight
900 Broadway, Suite 301
New York, NY 10003
www.endeavor.org
Tel: 646-368-6354
Skype: michael.p.goodwin
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/