I want to create a dataset, call it 'match-control' that groups together
all the observations in the same industry-year-decile combination, but
excludes the observation's own values for a particular variable. Thus if
there is only one firm in a decile, that firm will not be in the
combined sample. The resulting sample has obs= N * (N-1). If there are 3
firms, there will 3 *2 = 6 in the resulting match_control group for the
industry_year_decile.
Assume I have a dataset (match) that includes a unique identifier
(gvkey), industry (sic2), year, and decile (rroa), and the variable of
interest (dacc); I duplicate the dataset (control) renaming the vars as
c_gvkey, c_sic2, c_year, c_rroa, and c_dacc. For example, assume that
for the first decile I have the following for match and the same for
control, except that the var names have c_:
data = match
N gvkey year sic2 rroa dacc
1 1860 1991 13 0 0.11
2 2304 1991 13 0 -0.43
3 3949 1991 13 0 -0.31
4 5140 1991 13 0 -0.38
5 5188 1991 13 0 -0.28
6 7159 1991 13 0 -0.16
7 8104 1991 13 0 0.04
8 8838 1991 13 0 -0.07
9 8974 1991 13 0 0.08
10 12042 1991 13 0 -0.97
11 13183 1991 13 0 -0.28
12 15306 1991 13 0 -0.30
13 22398 1991 13 0 -0.21
14 23129 1991 13 0 -0.27
15 27199 1991 13 0 -0.17
Consider the SIC2=13 group. The first decile (coded 0) in the sample)
has 15 firms in 1991. The resulting group is 15 * 14 = 210 firms in the
match-control sample. The merging for the first gvkey is the following:
data = match_control
gvkey year sic2 dacc rroa c_dacc
1860 1991 13 0.11 0 -0.43
1860 1991 13 0.11 0 -0.31
1860 1991 13 0.11 0 -0.38
1860 1991 13 0.11 0 -0.28
1860 1991 13 0.11 0 -0.16
1860 1991 13 0.11 0 0.04
1860 1991 13 0.11 0 -0.07
1860 1991 13 0.11 0 0.08
1860 1991 13 0.11 0 -0.97
1860 1991 13 0.11 0 -0.28
1860 1991 13 0.11 0 -0.30
1860 1991 13 0.11 0 -0.21
1860 1991 13 0.11 0 -0.27
1860 1991 13 0.11 0 -0.17
In SAS, the following code does the trick.
proc sql;
create table match_control as select
match.*, control.c_dacc
from match, control
where sic2 = c_sic2
and year = c_year
and rroa = c_rroa
and gvkey ne c_gvkey; /* the magic words */
I have tried merge and joinby, but I have had no success. Can someone
tell me how to accomplish this one to many merge in Stata?
Thank you.
Amy Dunbar
University of Connecticut
School of Business
Department of Accounting
2100 Hillside Road Unit 1041
Storrs, CT 06269-1041
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/