Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Identify a Panel Structure with Incoherent ID and NAME Variable


From   田曦 <[email protected]>
To   STATA Help <[email protected]>
Subject   st: Identify a Panel Structure with Incoherent ID and NAME Variable
Date   Sat, 27 Aug 2011 03:19:11 +0000


I have a sample from a very large panel dataset looks like below.  There are 4 unique firms and 9 observations per firm. Firm's name and ID may change  occasionally because it was inputed by different people at different years. Firms name also change occasionally.
ID is a string variable, may combined with numbers and letters.

 Neither ID or Firm variable can fully identify a true unique firm from all 9 years. For example, "Cushing Farming", "Oklahoma Cushing Farming" are the same company, but with two IDs: 001834488 and 702639116; "Perkins Electricity" and "Oklahoma Perkins Electricity" are the same company, also has IDs: 001834488 and 70270423X;  "General Motors Stillwater",  "(Oklahoma) General Motors Stillwater" and "GM Stillwater" refer the same company, but with only two IDs: 177725736 and  714509511. 

obs Year    ID           Firm 
 1  1999  000846217  Stillwater Power 
 2  2000  000846217  Stillwater Power 
 3  2001  000846217  Oklahoma Stillwater Power Co. 
 4  2002  000846217  Oklahoma Stillwater Power Co. 
 5  2003  000846217  Oklahoma Stillwater Power Co. 
 6  2004  000846217  Stillwater Power Co. 
 7  2005  000846217  Stillwater Power Co. 
 8  2006  000846217  Stillwater Power Co. 
 9  2007  000846217  Stillwater Power Co. 
10  1999  001834488  Cushing Farming 
11  2000  001834488  Oklahoma Cushing Farming 
12  2001  001834488  Oklahoma Cushing Farming 
13  2002  001834488  Cushing Farming 
14  2003  001834488  Cushing Farming 
15  2004  001834488  Oklahoma Cushing Farming 
16  2005  001834488  Oklahoma Cushing Farming 
17  2006  001834488  Cushing Farming 
18  1999  001840116  Perkins Electricity 
19  2000  001840116  Perkins Electricity 
20  2001  001840116  Perkins Electricity 
21  2002  001840116  Perkins Electricity 
22  2003  001840116  Oklahoma Perkins Electricity 
23  2005  001840116  Perkins Electricity 
24  1999  177725736  General Motors Stillwater 
25  2007  702639116  Cushing Farming 
26  2004  70270423X  Perkins Electricity 
27  2006  70270423X  Perkins Electricity 
28  2007  70270423X  Oklahoma Perkins Electricity 
29  2000  714509511  (Oklahoma)General Motors Stillwater 
30  2001  714509511  (Oklahoma)General Motors Stillwater 
31  2002  714509511  (Oklahoma)General Motors Stillwater 
32  2003  714509511  (Oklahoma)General Motors Stillwater 
33  2004  714509511  General Motors Stillwater Oklahoma 
34  2005  714509511  General Motors Stillwater Oklahoma 
35  2006  714509511  General Motors Stillwater Oklahoma 
36  2007  714509511  GM Stillwater

I can tell they are the same firms because when I create group identifier for ID and group identifier for Firm variable, they always overlap each other at some point.  I'm thinking how to utilize the overlapping point as an joint knob. 

Many Thanks. 

Xi Tian 
Department of Economics 
Oklahoma State University 
 		 	   		  

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index