Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: match variable across two tables
From
Rongrong Zhang <[email protected]>
To
[email protected]
Subject
Re: st: RE: match variable across two tables
Date
Wed, 18 Dec 2013 21:42:46 -0500
Thanks Joe !
On Wed, Dec 18, 2013 at 4:32 PM, Joe Canner <[email protected]> wrote:
> Rochelle,
>
> There are two user-written commands available from SSC that might be of interest: -reclink- and -vmatch-. Both do some sort of "fuzzy" matching. I haven't used them, though, so I can't help you much beyond suggesting that you look into them.
>
> Regards,
> Joe Canner
> Johns Hopkins University School of Medicine
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Rongrong Zhang
> Sent: Wednesday, December 18, 2013 2:15 PM
> To: [email protected]
> Subject: st: match variable across two tables
>
> Dear STATALISTER,
>
> I have two tables:
>
> Table 1 has 3 variables I-O number, I-O Name , Related 1997 NAICS codes.
>
> Table 2 has 1 variable 1997 NAICS codes.
>
> I want to link these two tables based on NAICS code. However, the level of details on NAICS code does not match one-to-one because the tables come from different data source. My goal is to know which NAICS code correspond to which I-O number. I can’t use Table 1 only, because TABLE 2 is produced from Wharton Research Database which has company level financial data – I will use later on.
>
> By different details I mean : e.g.
>
> table 1:
>
> I-O number I-O Name 1997 NAICS codes
>
> 1110 Crop production
>
> 1111A0 Oilseed farming
> 11111 11112
>
> 1111B0 Grain farming 11113 11114 11115
> 11116 11119
>
> 111200 Vegetable and melon farming
> 1112
> 111400 Greenhouse and nursery production
> 1114
> 111910 Tobacco farming
> 11191
> 111920 Cotton farming
> 11192
> 1119A0 Sugarcane and sugar beet
> 11193 111991
> 1119B0 All other crop farming
> 11194 111992 111998
>
> in the above example, I present industry 1110 and its subindustries 1111A0, 1111B0, 111200, each of the subindustries correspond to a few (or a single) NAICS code (north america industry classification system).
>
> table 2:
> NAICS CODE
> 111
> 1111
> 111150
> 111199
> 111219
> 111310
> 111320
> 111332
> 111334
> 111335
> 111339
> 1114
> 111411
> 111419
> 111421
> 111422
> 111920
> 111930
> 111940
> 111998
>
> if I enforce exact match, then table 2 111150 matches with table 1
> 1111B0, table 2 1112l9 may be matched with 111200 table 1 I-O.
>
> My question :
> 1.could you give a sample code/function to do exact match? note, if first 5digit match, and drop last 0 (naics), we consider that a match 2. if I want to increase match, how could I change the program to do 4 digit match
>
> thanks a bunch,
>
> --
> Best,
> Rochelle
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
--
-Best,
R
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/