Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: How can I combine datasets
From
Teresio Poggio <[email protected]>
To
[email protected]
Subject
Re: st: How can I combine datasets
Date
Fri, 16 Sep 2011 00:20:20 +0200
Hello Kevin,
it seems to me more an issue of the rationale beyond your data
management problem, or an issue in data cleaning, rather than an issue
of stata commands.
If I've not misunderstood, you have 2 datasets containing different
variables for five years (I expect the same 5 years in both dataset,
coded in the same way). Counties vary in the different years (but I
expect they are coded in the same way in both the datasets, for the
same year; you should have carefully checked for this).
I also expect you don't have perfect correspondence on the
year-counties between the datasets, otherwise you won't experience
problems. So this seems to me not just a problem of time-variant
county definitions but also a problem of either missing information
for some year-counties or different codings between the two datasets.
I'd suggest to carefully check for this.
Two possible strategies:
a) keeping only time-invariant counties (your idea - is this ok for
your purposes?):
just merge 1:1 the two dataset using both year and county as key
variables. Then ispect the (authomatically produced) variable _merge
for matched (the ok ones) and unmatched cases.
(help merge for details)
Before dropping unmatched cases I would check for possible error in
codings and I'd assess missing data
b) dropping cases is a waste of available information. If it make
sense to your purposes and given the economy of your research, you may
wish to find a way to conciliate the different county definitions.
(I'm not considering here possible missing data or differences in
codings)
Switching to geographical areas and not counties (but the same logic
applies), suppose you have (either in the same data set or in the two
datasets)
a record like this (absolutely fictional):
Area Year Population
UK 2001 1,000
and a few records like this
Area Year Population
England 1991 987
Scotland 1991 456
Wales 1991 345
In order to avoid dropping cases you may wish to transform the latter
records in
Area Year Population
UK 1991 1,788
and then manage (merge) in accordance with the data for UK 2001
In this case you'd have to do some extra work (find a consistent way
to conciliate county definitions) and use the collapse command in
Stata (help collapse for details) . This would also imply that the
collapsing functions you use (sum in my example, it may be another
one) is meaningful for your data and to your purposes.
HTH
Teresio
--
____________________________________________________
dr. Teresio Poggio
LaboR - Dipartimento di Sociologia e ricerca sociale
Università degli studi di Trento
Via Verdi, 26
38100 Trento, Italy
Tel +39 0461/881406
fax: +39 0461/881348
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/