Just to expand briefly on Martin's main comment:
As they say in some old tales, your punishment is that you got what you
asked for.
If we forget missings for a moment, the main idea in the problematic
line is
replace matchernum = lister[`i'] if lister[`i'] == id[`j']
Now suppose that -lister- and -id- are identical for a particular pair
of observations, say at 42. Then the line becomes
replace matchernum = lister[`i'] if 42 == 42
But -if 42 == 42- is true for observations 1, 2, 3, etc. so -matchnum-
is replaced in all observations. You can fix this by adding an -in-
condition, but it would be better to do this, to make the logic clearer
if lister[`i'] == id[`j'] & lister[`i'] != . & id[`j'] != . {
replace matchernum = lister[`i'] in `j'
}
This could be slimmed to
if lister[`i'] == id[`j'] & lister[`i'] < . {
replace matchernum=lister[`i'] in `j'
}
If they're equal, then only one need be tested as non-missing, and we
can save a keystroke too.
By the way, it's not at all obvious to me that this is a -merge-
problem.
Nick
[email protected]
Martin Weiss
The problem with your code is the last line inside the loop: You
specifically instruct Stata to replace "matchernum" with "lister" and
subsequently complain that it contains this very number. Why not
instruct
Stata to replace with "name"?
Also, you do not specify an "in" qualifier for the last line, so Stata
by
default -replace-s the whole vector of "matchernum"s with the value
9118,
which is the only one that matches in your example data. (I assume that
the
"matcheri" and "matcherj" are filled with ones because you copied from
the
full dataset of 459 observations. when I rerun your code, only
observation 1
and 5 get a one.)
I would advise you to look at -merge- as a much better solution. Split
the
dataset between the original and the to-be-matched firms and then
-merge-
them back.
Li, Ihsuan
I am sorry about the variable names. Here is the actual codes and
results.
There are 459 observations, listed are results for eight observations.
Instead of a column of the names of the matched companies under
matchernum,
I get one same number under it.
For example, in row five, cusip #40707105, the company is Hamilton bros,
its
unique id is 9118, it was matched to Atna (row1).
Two questions really:
1. why is it returning identical number under matchernum?
2. how can I get it return the "name" instead of the id?
I hope I am making sense now.
gen matcheri=.
gen matcherj=.
gen matchernum=.
forvalues i=1(1) 459 {
forvalues j=`i'(1) 459 {
replace matcheri=1 in `i' if lister[`i']== id[`j'] & lister[`i']!=. &
id[`j'] !=.
replace matcherj=1 in `j' if lister[`i']== id[`j'] & lister[`i']!=. &
id[`j'] !=.
replace matchernum=lister[`i'] if lister[`i']== id[`j'] &
lister[`i']!=. &
id[`j'] !=.
}
}
Result:
cusip name lister id matcheri
matcherj matchernum
1 Ahtna 9118 4837 1
. 9118
2 Aleut 4654 4885 1
. 9118
3 Arctic Slope 278 9133 1
.
9118
4 Bering Strait 3506 4833 1
.
9118
40707105 HAMILTON BROS . 9118 .
1
9118
81611405 SEITEL INC . 4742 .
1 9118
82524102 SHOREWOOD CORP . 7864 .
1
9118
82770101 SILVER DINER . 4235 .
1
9118
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/