Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: merging aggregate and survey data with different state codes
From
Rebecca Pope <[email protected]>
To
[email protected]
Subject
Re: st: RE: merging aggregate and survey data with different state codes
Date
Fri, 23 Nov 2012 08:33:49 -0600
Ric wrote in response to a digest mailing
(http://www.stata.com/statalist/archive/2012-11/msg00874.html):
"Thanks for your efforts Austin and Rebecca. Problem still not
solved, but I appreciate your efforts." I'm responding to the original
thread in case anyone is following this in future.
Ric,
Can you tell us where in the process Austin's code/proposed approach
is failing? Please be as detailed as possible. I was able to match his
generated crosswalk to the data you pasted into your original post.You
should be able to use the new sequential numbers to match to your
aggregate data without a problem, so if that's not working, there is
clearly some little detail that is off.
Regards,
Rebecca
On Tue, Nov 20, 2012 at 10:13 PM, Rebecca Pope <[email protected]>
wrote:
>
> Austin, If by "problem with the crosswalk" you mean an inaccuracy, no,
> the crosswalk appears fine. All state names line up w/ their
> abbreviations & states are in alphabetical order 1 to 50 (this is w/
> the modification to drop DC).
>
> I also think the general approach is good. It is very easy to merge
> the generated crosswalk (on "code") to the survey data. This can be
> verified using the codes and state abbreviations that Ric posted. From
> that it should also be easy to merge the "aggregate" data referred to
> in the original post (on n2, which Ric will presumably give the same
> name as the ID variable in his data). I don't see how to accomplish
> Ric's request without this additional -merge- (crosswalk with survey)
> but that is surely better than typing 50 -replace- statements!
>
> -Rebecca
> Rebecca
>
>
>
> __o __o
> _`\ <,_ _`\ <,_
> (_)/ (_) (_)/ (_)
> =========================
>
>
> On Tue, Nov 20, 2012 at 8:10 PM, Austin Nichols <[email protected]>
> wrote:
> > OP contacted me off list saying "thanks but..." and said that only
> > codes 1 to 50 were shown, whereas he needed codes from 11 to 95.
> >
> > The example below produces this output, which is clearly a crosswalk
> > from one ordering 1-51 to Census state codes:
> >
> > st code n2
> > Alabama 63 1
> > Alaska 94 2
> > Arizona 86 3
> > Arkansas 71 4
> > California 93 5
> > Colorado 84 6
> > Connecticut 16 7
> > Delaware 51 8
> > District of Columbia 53 9
> > Florida 59 10
> > Georgia 58 11
> > Hawaii 95 12
> > Idaho 82 13
> > Illinois 33 14
> > Indiana 32 15
> > Iowa 42 16
> > Kansas 47 17
> > Kentucky 61 18
> > Louisiana 72 19
> > Maine 11 20
> > Maryland 52 21
> > Massachusetts 14 22
> > Michigan 34 23
> > Minnesota 41 24
> > Mississippi 64 25
> > Missouri 43 26
> > Montana 81 27
> > Nebraska 46 28
> > Nevada 88 29
> > New Hampshire 12 30
> > New Jersey 22 31
> > New Mexico 85 32
> > New York 21 33
> > North Carolina 56 34
> > North Dakota 44 35
> > Ohio 31 36
> > Oklahoma 73 37
> > Oregon 92 38
> > Pennsylvania 23 39
> > Rhode Island 15 40
> > South Carolina 57 41
> > South Dakota 45 42
> > Tennessee 62 43
> > Texas 74 44
> > Utah 87 45
> > Vermont 13 46
> > Virginia 54 47
> > Washington 91 48
> > West Virginia 55 49
> > Wisconsin 35 50
> > Wyoming 83 51
> >
> > Normally I ignore any off-list traffic, but can anyone confirm or deny
> > any problem with this generated crosswalk?
> > Or any problem with the general approach?
> > OP says states are 1 to 50, so perhaps DC should be deleted from the
> > list, but that is easy to do:
> >
> > ssc inst ddf2dct, replace
> > clear
> > tempfile do dct
> > loc d http://www.nber.org/cps/cpsmar92.ddf
> > ddf2dct using `d', dct(`dct') do(`do')
> > run `do'
> > drop _all
> > set obs 95
> > g cps=_n
> > la val cps hg_st60
> > decode cps, gen(st)
> > keep if !mi(st)
> > replace st=trim(st)
> > g code=cps
> > drop if code==53
> > sort st
> > g n2=_n
> > keep st code n2
> > list, noo clean
> >
> >
> > On Tue, Nov 20, 2012 at 3:51 PM, Austin Nichols
> > <[email protected]> wrote:
> >> Eric M. Uslaner <[email protected]>:
> >> You should be able to find a crosswalk, or you can make one yourself
> >> from data that has states defined using those Census codes. Try e.g.
> >>
> >> ssc inst ddf2dct, replace
> >> clear
> >> tempfile do dct
> >> loc d http://www.nber.org/cps/cpsmar92.ddf
> >> ddf2dct using `d', dct(`dct') do(`do')
> >> run `do'
> >> drop _all
> >> set obs 95
> >> g cps=_n
> >> la val cps hg_st60
> >> decode cps, gen(st)
> >> keep if !mi(st)
> >> replace st=trim(st)
> >> g code=cps
> >> sort st
> >> g n2=_n
> >> keep st code n2
> >> list, noo clean
> >>
> >>
> >> On Tue, Nov 20, 2012 at 3:05 PM, Eric M. Uslaner <[email protected]>
> >> wrote:
> >>> Greetings,
> >>>
> >>> I have two data sets and want to merge the aggregate data set into the
> >>> survey data set. Ultimately I want to merge the two data sets by
> >>> Congressional district (which is how the aggregate data set is organized).
> >>> But the major problem is that the two data sets use different codes for
> >>> states. For the aggregate data, the states are numbered from 1-50
> >>> alphabetically. But for the survey data the state codes are:
> >>>
> >>> tabl C3_PPSTATEN
> >>>
> >>> C3_PPSTATEN (State) code | freq
> >>> ------------------------------+--------
> >>> me 11 | 4
> >>> nh 12 | 3
> >>> vt 13 | 4
> >>> ma 14 | 34
> >>> ri 15 | 2
> >>> ct 16 | 20
> >>> ny 21 | 76
> >>> nj 22 | 42
> >>> pa 23 | 60
> >>> oh 31 | 56
> >>> in 32 | 24
> >>> il 33 | 49
> >>> mi 34 | 36
> >>> wi 35 | 23
> >>> mn 41 | 35
> >>> ia 42 | 18
> >>> mo 43 | 31
> >>> nd 44 | 1
> >>> sd 45 | 8
> >>> ne 46 | 12
> >>> ks 47 | 14
> >>> de 51 | 5
> >>> md 52 | 33
> >>> dc 53 | 4
> >>> va 54 | 29
> >>> wv 55 | 13
> >>> nc 56 | 41
> >>> sc 57 | 18
> >>> ga 58 | 41
> >>> fl 59 | 89
> >>> ky 61 | 16
> >>> tn 62 | 29
> >>> al 63 | 19
> >>> ms 64 | 8
> >>> ar 71 | 12
> >>> la 72 | 15
> >>> ok 73 | 15
> >>> tx 74 | 92
> >>> mt 81 | 6
> >>> id 82 | 3
> >>> co 84 | 17
> >>> nm 85 | 7
> >>> az 86 | 33
> >>> ut 87 | 10
> >>> nv 88 | 20
> >>> wa 91 | 34
> >>> or 92 | 20
> >>> ca 93 | 127
> >>> ak 94 | 1
> >>> hi 95 | 6
> >>> ------------------------------+--------
> >>> Total | 1315
> >>>
> >>> Clearly there is no one-to-one function I can think of that would
> >>> transform the state numbers in the aggregate data set. I could, of course,
> >>> create a new variable step by step:
> >>>
> >>> gen int statenew=.
> >>> replace statenew= 1 if statenum == 1 (where statenum is the numeric
> >>> equivalent of state; or alternatively "if state=="AR")
> >>> Etc. through replace statenew=55 if statenum == 49 (statenum = 50 is
> >>> WY and there are no respondents in the survey from WY)
> >>>
> >>> This seems rather cumbersome. If necessary I'll do it. Does anyone
> >>> have any easier way of doing this?
> >>>
> >>> Thanks much,
> >>>
> >>> Ric Uslaner
> >>> [email protected]
> >
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/faqs/resources/statalist-faq/
> > * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/