Thanks Martin...
It actually does have to do with the stable option. Right after the
first append in the .ado file, the appended data actually could have
(and in this case does have) duplicates for any exact matches, so the
sort command is ambiguous. Changing the lines:
append using `work'
sort `fullvars'
to
append using `work'
sort `fullvars', stable
fixes the problem. I'd encourage anyone who uses this to make the
change!
Thanks again,
John
=================================
John Hund
Visiting Assistant Professor
Jones Graduate School of Business
Rice University
Houston, TX 77005
On Aug 10, 2009, at 4:02 PM, John Hund wrote:
I am having a very perplexing problem with the nearmrg command...it
seems to give different results on subsequent runs with the same
data. In addition, my co-author and I get different results on the
the same datasets, similarly sorted. An example of the problem is
below, using a very small (5 observation) dataset. The two
datasets are ageinfo1 and ageinfo2:
ageinfo1
+----------------------------+
| id gender age income |
|----------------------------|
1. | 4 1 12 56 |
2. | 3 1 25 21 |
3. | 1 1 34 23 |
4. | 5 2 18 75 |
5. | 2 2 40 43 |
+----------------------------+
Note that ageinfo1 is sorted by gender and age, and doesn't contain
any duplicate values.
ageinfo2
+-----------------------------+
| id gender income age |
|-----------------------------|
1. | 415 1 12 12 |
2. | 314 1 32 25 |
3. | 516 2 65 18 |
4. | 213 2 32 40 |
5. | 12 2 12 34 |
+-----------------------------+
Not necessary to be sorted, but I subsequently sort this file to
facilitate replication. Then issuing the following commands in
order gives:
. use ageinfo2
. sort gender age
. nearmrg gender using ageinfo1, nearvar(age) lower genmatch(newage)
. list
+-----------------------------------------------+
| id gender income age _merge newage |
|-----------------------------------------------|
1. | 415 1 12 12 3 12 |
2. | 314 1 32 25 3 25 |
3. | 516 2 65 18 3 18 |
4. | 12 2 12 34 3 18 |
5. | 213 2 32 40 3 40 |
+-----------------------------------------------+
. clear
. use ageinfo2
. sort gender age
. nearmrg gender using ageinfo1, nearvar(age) lower genmatch(newage)
. list
+-----------------------------------------------+
| id gender income age _merge newage |
|-----------------------------------------------|
1. | 415 1 12 12 3 12 |
2. | 314 1 32 25 3 12 |
3. | 516 2 65 18 3 18 |
4. | 12 2 12 34 3 18 |
5. | 213 2 32 40 3 40 |
+-----------------------------------------------+
. clear
. use ageinfo2
. sort gender age
. nearmrg gender using ageinfo1, nearvar(age) lower genmatch(newage)
. list
+-----------------------------------------------+
| id gender income age _merge newage |
|-----------------------------------------------|
1. | 314 1 32 25 3 25 |
2. | 415 1 12 12 1 . |
3. | 516 2 65 18 3 18 |
4. | 12 2 12 34 3 18 |
5. | 213 2 32 40 3 40 |
+-----------------------------------------------+
The first outcome is correct, but subsequent runs give different
(and incorrect) answers. My only guess at this point is that there
is something going on with a temporary file which is not being
cleared, but I don't know how that could happen. Has anyone else
noticed a problem with this?
Thanks in advance,
John
=================================
John Hund
Visiting Assistant Professor
Jones Graduate School of Business
Rice University
Houston, TX 77005
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/