[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Merge problem

From	"Julian Fennema" <[email protected]>
To	<[email protected]>
Subject	st: Merge problem
Date	Tue, 29 Apr 2003 14:32:00 +0100

Dear all,

Apologies for the long-winded explanation that follows, but I am trying
to be as precise as possible about the problem that I am having.

I am constructing a panel from 44 files; 22 of production statistics and
22 of investment. The time dimension runs from 1978 to 1999. As of 1993
there was a change in the unique identifier for the firm, from a
two-part system (r1 and r2) to a single identifier variable (b1). I have
a 45th file which contains information on the change in identifier for
all firms operating in 1993 - "link.dta" conatins three variables; r1 r2
b1.

For all time periods, in both types of dataset, the relevant identifier
is removed from a long string and destrung (or converted to a number if
you want to avoid excessive butchery of the language). The code for
doing this is exactly the same for both the investment and the
production statistics, there are no duplicates, the format is the same
for both types; r1 is a 9-digit code, r2 has 3 digits and b1 8. For
periods prior to 1993 I issue a merge command

-merge r1 r2 using "link.dta"

The expected result is that for firms still operating in 1993 an extra
variable is created in the dataset, containing the value of b1.

As you would expect attrition means that not all firm operating in 1978
or so are still operating in 1993, so don't get linked to a value for b1
(_merge==1). Equally the link datafile contains a large amount of
unnecessary information so a large proportion come up as _merge==2. What
I don't understand is some of the _merge==3 observations. 

For the investment data, everything is fine. The majority of r1/r2
combinations are linked to a value of b1, which is added to the dataset.


However, for the production statistics something weird is going on.
Again the majority of combinations of r1/r2 are linked to b1
(_merge==3), and a new variable b1 is added. Yet this variable is always
coded as missing (.). 

So what seems to be occurring is that the r1/r2 combination is correctly
linked to a value of b1, but this value is not added correctly. 

I have checked the issue of format; the _merge==2 observations are added
neatly at the end of the file, with r1 in the correct column, r2 and b1
also. So there seems to be no issue of formatting (just checking).

The values of r1 and r2 are present in link.dta, and have an associated
value of b1 (as suggested by _merge==3). It is possible to make the
r1/r2 connection for link-investment and for production-investment, but
not link-production.

Could anyone give me a pointer as to what I am doing wrong? Thanks for
reading this far, and also in advance for any help.

Julian

P.S. An illustration of what I mean (with fewer digits). Please note
that the number of observations is not the same in the investment and
production datasets.

Link.dta
Obs. 		R1		r2		b1
1		100		001		123
2		101		001		124
3		102		001		125
4		102		002		126
5		103		001		127
6		104		001		128
7		105		001		129
8		106		001		130

Investment.dta
Obs. 		R1		r2		other vars
1		100		001		
2		101		001		
3		102		001		
4		102		002		
5		104		001		
6		107		001		

Production.dta
Obs. 		R1		r2		other vars
1		100		001		
2		101		001		
3		102		001		
4		102		002		
5		103		001		
6		104		001	
7		107		001
8		108		001	

-------*After a merge with link*------------

Investment.dta
Obs. 		R1		r2		b1		_merge
other vars
1		100		001		123		3

2		101		001		124		3

3		102		001		125		3

4		102		002		126		3

5		104		001		128		3

6		105		001		129		2
7		106		001		130		2
8		107		001		.		1


Production.dta
Obs. 		R1		r2		b1		_merge
other vars
1		100		001		.		3

2		101		001		.		3

3		102		001		.		3

4		102		002		.		3

5		104		001		.		3	
6		105		001		129		2
7		106		001		130		2

8		107		001		.		1
9		108		001		.		1

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: Re: Speed comparisons (Was: Suggestions?)
Next by Date: Re: st: Merge problem
Previous by thread: st: Hausman test for Independence from irrelevant alternatives
Next by thread: Re: st: Merge problem
Index(es):
- Date
- Thread