Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
AW: st: Merge Panel Datasets
From
"Diana Beketova" <[email protected]>
To
<[email protected]>
Subject
AW: st: Merge Panel Datasets
Date
Sun, 19 Jun 2011 08:28:32 +0200
Thank you Phil!
You describe everything right. Indeed, I have yearly total assets and
revenues in the master data and multiple observations per year about firm
ownership structure in the using-data. I have to analyze the impact of
foreign ownership on firm value/profitability afterwards. The dataset is
very big and also causes some problems because I am limited to 1GB memory on
my computer. The goal is also to find out which companies survived best
during the crisis, what type of foreign owner had the most positive effect
on firm performance etc. I made the merge again using 1:m-option and you can
see my results below. I just can't believe that out of almost 1.9 million
observations only 451 match.
merge 1:m ID_NUMBER YEAR using file2
ID_NUMBER was str15 now str16
Result # of obs.
-----------------------------------------
not matched 1,894,248
from master 387,108 (_merge==1)
from using 1,507,140 (_merge==2)
matched 451 (_merge==3)
-----------------------------------------
At the end I want to have a panel that contains all observations over years,
so I can run all the needed regressions etc. Do you think it is somehow
possible?
Diana
-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Phil Schumm
Gesendet: Sunday, June 19, 2011 2:47 AM
An: [email protected]
Betreff: Re: st: Merge Panel Datasets
On Jun 18, 2011, at 5:13 PM, Diana Beketova wrote:
> I have a problem merging two panel datasets. In one file I have data
> about companies' balance sheet and P&L sorted by company's ID-
> number. It looks like this:
>
> Obs ID-number year Total Assets Operat. Revenue
> 1 123 2002 500 100
> 2 123 2005 505 110
> 3 123 2006 600 120
> 4 789 2001 550 340
> 5 789 2005 670 560
>
> So there are missing values within the years, and ID-number repeats
> itself for a group of observations. So, I think in this case it
> can't work as an unique identifier.
>
> The next file contains ownership information, also a panel dataset.
>
> Obs ID-number year Ownership % Country of origin
> 1 123 2002 20% DE
> 2 123 2002 50% FR
> 3 123 2002 30% UK
> 4 123 2005 30% DE
> 5 123 2005 40% FR
> 6 123 2005 30% UK
> 7 789 2001 50% CN
> 8 789 2001 50% US
> 9 789 2003 70% CN
> 10 789 2003 30% US
>
> Here, ownership information changes over years, but it can also
> happen that panel data contains missing values in years.
>
> Is there any possibility to merge these two files together?
>
> That it looks like this:
>
> Obs ID-number year Total Assets Operat. Revenue Ownership% Country
> of origin
> 1 123 2002 500 100 20% DE
> 2 123 2002 500 100 50% FR
> 3 123 2002 500 100 30% UK
> 4 123 2005 505 110 30% DE
> 5 123 2005 505 110 40% FR
> 6 123 2005 505 110 30% UK
> 7 123 2006 600 120
> 8 789 2001 550 340 50% CN
> 9 789 2001 550 340 50% US
> 10 789 2003 70% CN
> 11 789 2003 30% US
> 5 789 2005 670 560
>
> Is it enough to sort the datasets by ID-number and year and then
> merge them using these two as unique identifier? In my case I get so
> little observations that match after the merge that I think that I
> am doing something wrong.
There isn't any problem with -merge- here; you did
merge 1:m id_number year using file2
to generate the result you show above. The question is: What do you
want? In the first dataset, you have what appear to be yearly assets
and operating revenues for each firm (i.e., a maximum of one
observation per firm per year; note that the fact that some firms
might not have data available for a given year is irrelevant for this
discussion). In the second dataset, however, you have multiple
observations per year for each firm. I assume that this means, for
example, that in 2002, 20% of firm 123 was owned by German investors,
50% by French investors, and 30% by British investors? Regardless,
you have to figure out what you want for a result before you can
determine the appropriate way to use -merge-, and that will probably
depend on what analysis you are going to perform. If your only goal
is to combine the data in the two files into one file without any loss
of information, then what you've done above may be adequate. It's
clearly not an efficient way to store the data (because of the
duplication), but if the dataset isn't too large, that might not matter.
-- Phil
P.S. It's no longer necessary to sort your data before merging -- the -
merge- command will now take care of that for you.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
-----
eMail ist virenfrei.
Von AVG überprüft - www.avg.de
Version: 10.0.1382 / Virendatenbank: 1513/3711 - Ausgabedatum: 18.06.2011
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/