Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: matching observations for merging
From
Maarten buis <[email protected]>
To
[email protected]
Subject
Re: st: matching observations for merging
Date
Thu, 17 Jun 2010 15:56:29 +0000 (GMT)
--- On Thu, 17/6/10, Abhimanyu Arora wrote:
> I have to files to be merged. Is it possible to merge using
> an approximation of the merging variable? In other words, if
> my merging variable is say, country, there could be a slight change in
> spelling of some countries (Afghanistan/ Afganistan) in the two
> files...Is there a more efficient way than just going through all 200+
> countries and checking spelling consistency?
For countries the quickest way is to
1) keep in each dataset one observation per country
2) merge the 2 datasets
3) keep if _merge != 3
4) sort on country name
5) list
This will display a list of troublesome country names, which is
usually so short that it doesn't pay to do anything more fancy.
With this list you can create a recode .do file which harmonizes
country names before the final merge.
Moreover, this harmonization do file can be a good starting position
in any subsequent project involving the merge on country names, as the
kind of inconsistencies in country names are pretty similar across
files. So at the begining of each project you start by running the
harmonization do-file of the last project, than go through steps 1-5
to find any mismatches that weren't handeld in the last do-file, and
add those to your new harmonization file. After 4 or 5 projects you
will hardly find any mismatch anymore.
Hope this helps,
Maarten
--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany
http://www.maartenbuis.nl
--------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/