[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: unstable results with repeating the nearmrg command

From	John Hund <[email protected]>
To	[email protected]
Subject	st: Re: unstable results with repeating the nearmrg command
Date	Tue, 11 Aug 2009 07:11:55 -0500

Thanks Martin...

It actually does have to do with the stable option. Right after thefirst append in the .ado file, the appended data actually could have(and in this case does have) duplicates for any exact matches, so thesort command is ambiguous. Changing the lines:


append using `work'
sort `fullvars'

to

append using `work'
sort `fullvars', stable

fixes the problem. I'd encourage anyone who uses this to make thechange!


Thanks again,
John
=================================
John Hund
Visiting Assistant Professor
Jones Graduate School of Business
Rice University
Houston, TX 77005



On Aug 10, 2009, at 4:02 PM, John Hund wrote:

I am having a very perplexing problem with the nearmrg command...itseems to give different results on subsequent runs with the samedata. In addition, my co-author and I get different results on thethe same datasets, similarly sorted. An example of the problem isbelow, using a very small (5 observation) dataset. The twodatasets are ageinfo1 and ageinfo2:


ageinfo1
     +----------------------------+
     | id   gender   age   income |
     |----------------------------|
  1. |  4        1    12       56 |
  2. |  3        1    25       21 |
  3. |  1        1    34       23 |
  4. |  5        2    18       75 |
  5. |  2        2    40       43 |
     +----------------------------+

Note that ageinfo1 is sorted by gender and age, and doesn't containany duplicate values.


ageinfo2
     +-----------------------------+
     |  id   gender   income   age |
     |-----------------------------|
  1. | 415        1       12    12 |
  2. | 314        1       32    25 |
  3. | 516        2       65    18 |
  4. | 213        2       32    40 |
  5. |  12        2       12    34 |
     +-----------------------------+

Not necessary to be sorted, but I subsequently sort this file tofacilitate replication. Then issuing the following commands inorder gives:


. use ageinfo2

. sort gender age

. nearmrg gender using ageinfo1, nearvar(age) lower genmatch(newage)

. list

     +-----------------------------------------------+
     |  id   gender   income   age   _merge   newage |
     |-----------------------------------------------|
  1. | 415        1       12    12        3       12 |
  2. | 314        1       32    25        3       25 |
  3. | 516        2       65    18        3       18 |
  4. |  12        2       12    34        3       18 |
  5. | 213        2       32    40        3       40 |
     +-----------------------------------------------+

. clear

. use ageinfo2

. sort gender age

. nearmrg gender using ageinfo1, nearvar(age) lower genmatch(newage)

. list

     +-----------------------------------------------+
     |  id   gender   income   age   _merge   newage |
     |-----------------------------------------------|
  1. | 415        1       12    12        3       12 |
  2. | 314        1       32    25        3       12 |
  3. | 516        2       65    18        3       18 |
  4. |  12        2       12    34        3       18 |
  5. | 213        2       32    40        3       40 |
     +-----------------------------------------------+

. clear

. use ageinfo2

. sort gender age

. nearmrg gender using ageinfo1, nearvar(age) lower genmatch(newage)

. list

     +-----------------------------------------------+
     |  id   gender   income   age   _merge   newage |
     |-----------------------------------------------|
  1. | 314        1       32    25        3       25 |
  2. | 415        1       12    12        1        . |
  3. | 516        2       65    18        3       18 |
  4. |  12        2       12    34        3       18 |
  5. | 213        2       32    40        3       40 |
     +-----------------------------------------------+

The first outcome is correct, but subsequent runs give different(and incorrect) answers. My only guess at this point is that thereis something going on with a temporary file which is not beingcleared, but I don't know how that could happen. Has anyone elsenoticed a problem with this?


Thanks in advance,
John
=================================
John Hund
Visiting Assistant Professor
Jones Graduate School of Business
Rice University
Houston, TX 77005


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: AW: Re: unstable results with repeating the nearmrg command
  - From: "Martin Weiss" <[email protected]>

Prev by Date: FW: st: analysis question
Next by Date: st: AW: Re: unstable results with repeating the nearmrg command
Previous by thread: st: unstable results with repeating the nearmrg command
Next by thread: st: AW: Re: unstable results with repeating the nearmrg command
Index(es):
- Date
- Thread