Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <n.j.cox@durham.ac.uk> |
To | "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: RE: Tidying up a New and Old ID mapping dataset |
Date | Wed, 9 Mar 2011 20:37:55 +0000 |
Sorry, no extra insights. Note to international audience: in British GP is "general practitioner", a non-specialist doctor (*), the first port of call for medical consultations other than accidents and emergencies. (*) Meaning, a medic with first degrees in medicine and surgery, sometimes others. Nothing to do with Ph.D.s. Only very, very rarely an M.D. Nick n.j.cox@durham.ac.uk Ada Ma You are right about the trumping rule. I have over 100 lines of these mapping rules, I need to sort out this list of rules, because I need to create a mapping list so that I can merge it to the data sets I'll be using for analyses. The data is GP practices. There are around 1000 of them in Scotland. They merge / demerge / new GP joins / old GP leaves etc., every time such an action takes place a new practice ID is given to the practice. To follow a practice through years throughout its transformation I have to bundle several practices together and treat it as a overriding practice. Here are two examples of those statements (not real practice numbers): 100033 SPLIT AND BECAME 10066 AND 10077 ON 10/2003 10066 MERGED WITH 10022 AND BECAME 10088 04/2008 10066 MERGED WITH 10022 AND BECAME 10088 I have stripped out all the practice IDs but not sure how to make it clean, so that I get the mapping right. On Wed, Mar 9, 2011 at 5:01 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote: > I don't know whether I understand this. The issue appears to be that according to one rule C should be mapped to D and according to another rule D should be mapped to E and that trumps the first rule. And presumably there are other examples this kind. And the example is not to be taken literally, but is schematic. > > If that is so, all I can suggest is that the trumping rule is applied last, so that this sounds like -replace- followed by another. I don't know why a loop is thought necessary if there are most two steps. > > Nick > n.j.cox@durham.ac.uk > > Ada Ma > > I have this dataset which has two series of number IDs. Say it looks like this: > > OriginalID NewID > A E > B E > D E > C D > > > I need to map this information to existing data sets, so that all the > observations A, B, C, D, are mapped to become E. > > As you can see it's rather straightforward for the first three > observations, but for the fourth observation, C is mapped to D. I > need to correct this information so that when the NewID is found > amongst the OriginalID, it is updated to contain the correct NewID. > > I need to write a few line of commands that would pick up the fourth > observation because it's NewID appears as the OriginalID in the third > observation, and replaces the fourth obs's NewID with the third obs's > NewID, so that the corrected dataset looks like this. > > OriginalID NewID > A E > B E > D E > C E > > > I can write a loop to compare the NewID against every OriginalID in > the data, but then it will take a few rounds of the looping to get the > whole thing tidied up, are there any better method? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/