Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Merge Panel Datasets
From
Phil Schumm <[email protected]>
To
[email protected]
Subject
Re: st: Merge Panel Datasets
Date
Sat, 18 Jun 2011 19:47:26 -0500
On Jun 18, 2011, at 5:13 PM, Diana Beketova wrote:
I have a problem merging two panel datasets. In one file I have data
about companies' balance sheet and P&L sorted by company's ID-
number. It looks like this:
Obs ID-number year Total Assets Operat. Revenue
1 123 2002 500 100
2 123 2005 505 110
3 123 2006 600 120
4 789 2001 550 340
5 789 2005 670 560
So there are missing values within the years, and ID-number repeats
itself for a group of observations. So, I think in this case it
can't work as an unique identifier.
The next file contains ownership information, also a panel dataset.
Obs ID-number year Ownership % Country of origin
1 123 2002 20% DE
2 123 2002 50% FR
3 123 2002 30% UK
4 123 2005 30% DE
5 123 2005 40% FR
6 123 2005 30% UK
7 789 2001 50% CN
8 789 2001 50% US
9 789 2003 70% CN
10 789 2003 30% US
Here, ownership information changes over years, but it can also
happen that panel data contains missing values in years.
Is there any possibility to merge these two files together?
That it looks like this:
Obs ID-number year Total Assets Operat. Revenue Ownership% Country
of origin
1 123 2002 500 100 20% DE
2 123 2002 500 100 50% FR
3 123 2002 500 100 30% UK
4 123 2005 505 110 30% DE
5 123 2005 505 110 40% FR
6 123 2005 505 110 30% UK
7 123 2006 600 120
8 789 2001 550 340 50% CN
9 789 2001 550 340 50% US
10 789 2003 70% CN
11 789 2003 30% US
5 789 2005 670 560
Is it enough to sort the datasets by ID-number and year and then
merge them using these two as unique identifier? In my case I get so
little observations that match after the merge that I think that I
am doing something wrong.
There isn't any problem with -merge- here; you did
merge 1:m id_number year using file2
to generate the result you show above. The question is: What do you
want? In the first dataset, you have what appear to be yearly assets
and operating revenues for each firm (i.e., a maximum of one
observation per firm per year; note that the fact that some firms
might not have data available for a given year is irrelevant for this
discussion). In the second dataset, however, you have multiple
observations per year for each firm. I assume that this means, for
example, that in 2002, 20% of firm 123 was owned by German investors,
50% by French investors, and 30% by British investors? Regardless,
you have to figure out what you want for a result before you can
determine the appropriate way to use -merge-, and that will probably
depend on what analysis you are going to perform. If your only goal
is to combine the data in the two files into one file without any loss
of information, then what you've done above may be adequate. It's
clearly not an efficient way to store the data (because of the
duplication), but if the dataset isn't too large, that might not matter.
-- Phil
P.S. It's no longer necessary to sort your data before merging -- the -
merge- command will now take care of that for you.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/