Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Robert Picard <picard@netbox.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: matching cases by a transitive relation |
Date | Sun, 13 Jan 2013 18:15:59 -0500 |
If I understand the problem correctly, I think that this can be solved easily using -group_id- (available from SSC). Here's an example of how I would proceed: *------------------------------ sample code ------------------- clear input sibling1 sibling2 1 2 2 1 2 3 4 5 5 4 4 8 7 9 9 7 10 3 end gen pairid = _n * convert from wide to long the identifiers expand 2 sort pairid by pairid: gen id = sibling1 if _n == 1 by pairid: replace id = sibling2 if _n == 2 * group the initial relationship when the id match gen sibling_group = pairid group_id sibling_group, matchby(id) * pick one record per id within a sibling_group sort sibling_group id pairid by sibling_group id: gen pick = _n == 1 list sibling_group id if pick, noobs sepby(sibling_group) *------------------------------ end sample code --------------- On Fri, Jan 11, 2013 at 7:03 AM, Robert De Vries <robert.devries@sociology.ox.ac.uk> wrote: > Dear Statalisters, > > I have a problem with attempting to match cases by a transitive relation (A is related to B, B is related to C, so C must be related to A). > > Specifically, I am working with the longitudinal British Household Panel Study (BHPS), and I am attempting to match siblings across time. I can straightforwardly create a dataset which includes the ID number of all sibling pairs in the dataset in the following format: > > ID | SIBLING ID > A | B > B | A > B | C > > However, this dataset does not reflect the additional relationship A-C. This occurs when A and C are siblings but have never actually lived together. For example, in Wave 1, A and B are siblings living together. By Wave 2, A has moved out, and B has gained a new sibling; C (this might be a step-sibling, for example, or a new birth). My dataset reflects that fact that A and B are siblings, and that B and C are siblings, but because A and C have never been coded as siblings, my dataset does not reflect that they are. > > By their transitive relation through B, we know that A and C are siblings. My question is: what code could I write to get the dataset to reflect this? I need to somehow tell Stata that if A is related to B AND B is related to C, you need to create a new case which reflects that A is related to C. > > Hope you can help! > > Robert de Vries > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/