Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: merging aggregate and survey data with different state codes
From
Rebecca Pope <[email protected]>
To
[email protected]
Subject
Re: st: RE: merging aggregate and survey data with different state codes
Date
Mon, 26 Nov 2012 14:36:36 -0600
I received a private e-mail with datasets attached and saying the following:
"Thanks much for your interest. The situation is more complicated
[...] At a more fundamental level I don’t understand what Austin
suggested—it is nice that people give answers but neither he nor you
gave me any idea as to how to use this solution with my data rather
than data from NBER). This solution may just as well have been
written in Japanese. But as you can see the problem is more
complicated especially since encode seems to have failed to give the
correct numerical values of a string variable."
I thought sufficient detail on how to use the solution with his data
was supplied. In my previous post, I noted that 2 merges would be
required: crosswalk-to-data and data-to-data. Austin may have had
another strategy in mind, but to my knowledge, a crosswalk implies a
merge. I am not sure what was unclear about Austin's crosswalk or my
follow-up verifying it. In case of not understanding/still having
trouble, the appropriate response is to post _exactly_ where you run
into a problem. This sort of public dialog helps those who might have
a problem similar to yours. However, here it is again, as clear as I
can make it. I am going to confine my remarks here to the original
question: How do you merge datasets when the IDs are different?
Separate questions, i.e. issues with -encode-, should be handled in a
separate post.
Step 1: Get a crosswalk. Austin's previously posted code to do this
(http://www.stata.com/statalist/archive/2012-11/msg00819.html).
st code n2
Alabama 63 1
Alaska 94 2
Arizona 86 3
Arkansas 71 4
California 93 5
Step 2: Rename variables in crosswalk to match study datasets.
***
rename code C3_PPSTATEN /* State code for survey data */
rename n2 statenum /* State code for aggregate data */
***
Step 3: Merge crosswalk to aggregate data, adding "C3_PPTATEN" to the
aggregate data. You can visually compare the state names (st) from
Austin's crosswalk to the state names from the aggregate data (State)
and see that this crosswalk is accurate. For that matter, you could
have merged on the text field and skipped the whole business of
generating n2 in this case.
***
merge 1:1 statenum using "Statalist\teapartyfactions2010.dta"
list st State C3_PPSTATEN statenum in 1/5, noobs clean
***
st State C3_PPS~N statenum
Alabama Alabama 63 1
Alaska Alaska 94 2
Arizona Arizona 86 3
Arkansas Arkansas 71 4
California California 93 5
Step 4: Merge survey data to aggregate data using C3_PPSTATEN.
***
merge 1:m C3_PPSTATEN using "Statalist\anes2010egss3small.dta", gen(_merge2)
drop if _merge2!=3 *Gets rid of WY (no survey) & DC (no aggregate
data); modify at will
preserve
bys st: keep if _n==1
list st C3_PPSTATEN statenum statenew in 1/5, noobs clean
restore
***
st C3_PPS~N statenum statenew
Alabama 63 1 al
Alaska 94 2 ak
Arizona 86 3 az
Arkansas 71 4 ar
California 93 5 ca
If a step in this particular process fails, please let us know what
the error is. If it produces results different from what you want,
post a _short_ example of the ideal end result and your input data.
Specifics will get you better help.
Regards,
Rebecca
<snip>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/