dear listers,
i have two datasets and i want to match them on a key variable. the
problem is that the key variable differs slightly between the two
datasets. i'll explain what this means.
in dataset 1 the key may look like this
1
2
3
4A
4B
5A
5B
5C
6
...
in dataset 2 the key may look like this
1A
1B
2
3
4A
4B
5A
5B
5C
6A
6B
...
the reason for these discrepancies is that, the unit of of observation
is a plot (of land) and some plots have split (for example 1 has split
into 1A and 1B, 5 has split into 5A and 5B, etc) between the two
periods of time. i want to merge the two datasets keeping in mind
these potential splits, so that 1A and 1B are both matched to 1.
i figured a long way to do this: generating a "de-lettered" identifier
in dataset two. then doing two succesive merges. sth like:
merge key using dataset1
drop if _m == 2
drop _m
rename key letteredkey
rename deletteredkey key
sort key
merge key using dataset1, update
drop if _m == 2
is there a shorter, perhaps more clever way to do this? i found a
user-written ado -nearmrg-, which does exactly what i want but only
for numeric keys.
thanks a lot for this,
radu ban
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/