Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Joe Canner <jcanner1@jhmi.edu> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | st: RE: match variable across two tables |
Date | Wed, 18 Dec 2013 21:32:24 +0000 |
Rochelle, There are two user-written commands available from SSC that might be of interest: -reclink- and -vmatch-. Both do some sort of "fuzzy" matching. I haven't used them, though, so I can't help you much beyond suggesting that you look into them. Regards, Joe Canner Johns Hopkins University School of Medicine -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Rongrong Zhang Sent: Wednesday, December 18, 2013 2:15 PM To: statalist@hsphsun2.harvard.edu Subject: st: match variable across two tables Dear STATALISTER, I have two tables: Table 1 has 3 variables I-O number, I-O Name , Related 1997 NAICS codes. Table 2 has 1 variable 1997 NAICS codes. I want to link these two tables based on NAICS code. However, the level of details on NAICS code does not match one-to-one because the tables come from different data source. My goal is to know which NAICS code correspond to which I-O number. I can’t use Table 1 only, because TABLE 2 is produced from Wharton Research Database which has company level financial data – I will use later on. By different details I mean : e.g. table 1: I-O number I-O Name 1997 NAICS codes 1110 Crop production 1111A0 Oilseed farming 11111 11112 1111B0 Grain farming 11113 11114 11115 11116 11119 111200 Vegetable and melon farming 1112 111400 Greenhouse and nursery production 1114 111910 Tobacco farming 11191 111920 Cotton farming 11192 1119A0 Sugarcane and sugar beet 11193 111991 1119B0 All other crop farming 11194 111992 111998 in the above example, I present industry 1110 and its subindustries 1111A0, 1111B0, 111200, each of the subindustries correspond to a few (or a single) NAICS code (north america industry classification system). table 2: NAICS CODE 111 1111 111150 111199 111219 111310 111320 111332 111334 111335 111339 1114 111411 111419 111421 111422 111920 111930 111940 111998 if I enforce exact match, then table 2 111150 matches with table 1 1111B0, table 2 1112l9 may be matched with 111200 table 1 I-O. My question : 1.could you give a sample code/function to do exact match? note, if first 5digit match, and drop last 0 (naics), we consider that a match 2. if I want to increase match, how could I change the program to do 4 digit match thanks a bunch, -- Best, Rochelle * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/