Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Nick Cox" <n.j.cox@durham.ac.uk> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | st: RE: AW: RE: AW: AW: Creating a Group Pair ID (where the generating variables order shouldn't matter) |
Date | Fri, 23 Jul 2010 13:30:16 +0100 |
Good question. Your identifier would lead to an integer variable with labels if -group()- were used with the -label- option. A good identifier should be informative as well as distinct, so I regard using -label- as very good practice. I didn't spot that you weren't following that very good practice. My mistake. Nick n.j.cox@durham.ac.uk Martin Weiss " Which could in turn be made simpler:" Though the two approaches hardly lead to the same result. My notion of an "ID", as originally requested, would not be a string such as "England France", but a numeric variable running from 1 to N, with N the number of distinct groups. ************* clear* inp str20 c1id str20 c2id "US" "Canada" "US" "Mexico" "Canada" "US" "US" "France" "France" "England" "France" "US" end gen newid = cond(c1id < c2id, c1id, c2id) /* */ + " " + cond(c2id < c1id, c1id, c2id) sort newid l, sepby(newid) noo ************* What makes you think that my approach returns a " ...integer variable with labels."? All I can find is a -varlabel- attached to my newid. . d newid storage display value variable name type format label variable label ---------------------------------------------------------------------------- --------------- newid float %9.0g group(first second) HTH Martin -----Ursprüngliche Nachricht----- Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Nick Cox Gesendet: Freitag, 23. Juli 2010 13:59 An: statalist@hsphsun2.harvard.edu Betreff: st: RE: AW: AW: Creating a Group Pair ID (where the generating variables order shouldn't matter) Which could in turn be made simpler: gen first = cond(c1id < c2id, c1id, c2id) gen second = cond(c2id < c1id, c1id, c2id) egen newid = group(first second) drop first second sort newid could become gen newid = cond(c1id < c2id, c1id, c2id) + " " + cond(c2id < c1id, c1id, c2id) sort newid The cost is greater storage, which may or may not bite: that is, -newid- is a string variable rather than an integer variable with labels. But if you have enough space to create -first- and -second- as string variables, even fleetingly, you presumably have enough space for a string -newid-. Nick n.j.cox@durham.ac.uk Martin Weiss Essentially, the technique advocated in NJC`s tip boils down to a simple trick: ************* clear* inp str20 c1id str20 c2id "US" "Canada" "US" "Mexico" "Canada" "US" "US" "France" "France" "England" "France" "US" end gen first = cond(c1id < c2id, c1id, c2id) gen second = cond(c2id < c1id, c1id, c2id) egen newid = group(first second) drop first second sort newid l, sepby(newid) noo ************* Martin Weiss Try NJC`s http://www.stata-journal.com/article.html?article=dm0043 J Taylor I am trying to create an ID corresponding to numbers from two lists. For example, if the two lists were of countries, one would have clear input str20 c1id str20 c2id "US" "Canada" "US" "Mexico" "Canada" "US" "US" "France" "France" "England" "France" "US" end egen newid = group(c1id c2id) I would like newid to create an ID pair for each country pair. My first instinct was to use the egen group command. However, the problem is that egen group takes into account which id comes first. For example, (c1id,c2id)=(United States,Canada) and (c1id,c2id)=( Canada ,United States) have different IDs. I would like them to be able to have the same ID. That is, I would like to create newid as a group pair ID, reflecting which two countries are in the pair, and where the order doesn't matter. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/