Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Identify a Panel Structure with Incoherent ID and NAME Variable
From
田曦 <[email protected]>
To
STATA Help <[email protected]>
Subject
st: Identify a Panel Structure with Incoherent ID and NAME Variable
Date
Sat, 27 Aug 2011 03:19:11 +0000
I have a sample from a very large panel dataset looks like below. There are 4 unique firms and 9 observations per firm. Firm's name and ID may change occasionally because it was inputed by different people at different years. Firms name also change occasionally.
ID is a string variable, may combined with numbers and letters.
Neither ID or Firm variable can fully identify a true unique firm from all 9 years. For example, "Cushing Farming", "Oklahoma Cushing Farming" are the same company, but with two IDs: 001834488 and 702639116; "Perkins Electricity" and "Oklahoma Perkins Electricity" are the same company, also has IDs: 001834488 and 70270423X; "General Motors Stillwater", "(Oklahoma) General Motors Stillwater" and "GM Stillwater" refer the same company, but with only two IDs: 177725736 and 714509511.
obs Year ID Firm
1 1999 000846217 Stillwater Power
2 2000 000846217 Stillwater Power
3 2001 000846217 Oklahoma Stillwater Power Co.
4 2002 000846217 Oklahoma Stillwater Power Co.
5 2003 000846217 Oklahoma Stillwater Power Co.
6 2004 000846217 Stillwater Power Co.
7 2005 000846217 Stillwater Power Co.
8 2006 000846217 Stillwater Power Co.
9 2007 000846217 Stillwater Power Co.
10 1999 001834488 Cushing Farming
11 2000 001834488 Oklahoma Cushing Farming
12 2001 001834488 Oklahoma Cushing Farming
13 2002 001834488 Cushing Farming
14 2003 001834488 Cushing Farming
15 2004 001834488 Oklahoma Cushing Farming
16 2005 001834488 Oklahoma Cushing Farming
17 2006 001834488 Cushing Farming
18 1999 001840116 Perkins Electricity
19 2000 001840116 Perkins Electricity
20 2001 001840116 Perkins Electricity
21 2002 001840116 Perkins Electricity
22 2003 001840116 Oklahoma Perkins Electricity
23 2005 001840116 Perkins Electricity
24 1999 177725736 General Motors Stillwater
25 2007 702639116 Cushing Farming
26 2004 70270423X Perkins Electricity
27 2006 70270423X Perkins Electricity
28 2007 70270423X Oklahoma Perkins Electricity
29 2000 714509511 (Oklahoma)General Motors Stillwater
30 2001 714509511 (Oklahoma)General Motors Stillwater
31 2002 714509511 (Oklahoma)General Motors Stillwater
32 2003 714509511 (Oklahoma)General Motors Stillwater
33 2004 714509511 General Motors Stillwater Oklahoma
34 2005 714509511 General Motors Stillwater Oklahoma
35 2006 714509511 General Motors Stillwater Oklahoma
36 2007 714509511 GM Stillwater
I can tell they are the same firms because when I create group identifier for ID and group identifier for Firm variable, they always overlap each other at some point. I'm thinking how to utilize the overlapping point as an joint knob.
Many Thanks.
Xi Tian
Department of Economics
Oklahoma State University
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/