Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: merge datasets using "closest" match


From   "Radu Ban" <[email protected]>
To   [email protected]
Subject   Re: st: RE: merge datasets using "closest" match
Date   Fri, 30 Jun 2006 10:01:50 -0400

Dear Scott,

Thanks for the help. Your idea is great. I just need to fix it up, as
my data has identifiers with 2 or more digits as well (i.e. 11, 11A,
11B), in which case keeping only the first digit is not enough. But I
know how to this.

Thanks again,
Radu

2006/6/29, Scott Merryman <[email protected]>:
I believe the example below, which merges only the first character of the id
variable works without two successive merges.

Scott


clear
tempfile  tmp1

input str2 id
1A
1B
2
3
4A
4B
5A
5C
5B
6A
6B
7
end
sort id
gen id_num = substr(id, 1,1)
sort id_num
save `tmp1'

clear
input str2 id2
1
2
3
4B
4A
5C
5A
5B
6
7A
7B
end
sort id2
gen id_num = substr(id2, 1,1)
sort id_num

merge  id_num using `tmp1'
drop _m id_num
sort id2 id

l

> -----Original Message-----
> From: [email protected] [mailto:owner-
> [email protected]] On Behalf Of Radu Ban
> Sent: Thursday, June 29, 2006 4:22 PM
> To: [email protected]
> Subject: st: merge datasets using "closest" match
>
> dear listers,
>
> i have two datasets and i want to match them on a key variable. the
> problem is that the key variable differs slightly between the two
> datasets. i'll explain what this means.
>
> in dataset 1 the key may look like this
> 1
> 2
> 3
> 4A
> 4B
> 5A
> 5B
> 5C
> 6
> ...
>
> in dataset 2 the key may look like this
> 1A
> 1B
> 2
> 3
> 4A
> 4B
> 5A
> 5B
> 5C
> 6A
> 6B
> ...
>
> the reason for these discrepancies is that, the unit of of observation
> is a plot (of land) and some plots have split (for example 1 has split
> into 1A and 1B, 5 has split into 5A and 5B, etc) between the two
> periods of time. i want to merge the two datasets keeping in mind
> these potential splits, so that 1A and 1B are both matched to 1.
>
> i figured a long way to do this: generating a "de-lettered" identifier
> in dataset two. then doing two succesive merges. sth like:
>
> merge key using dataset1
> drop if _m == 2
> drop _m
>
> rename key letteredkey
> rename deletteredkey key
> sort key
> merge key using dataset1, update
> drop if _m == 2
>
> is there a shorter, perhaps more clever way to do this? i found a
> user-written ado -nearmrg-, which does exactly what i want but only
> for numeric keys.
>
> thanks a lot for this,
> radu ban



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index