Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: merging aggregate and survey data with different state codes
From
Austin Nichols <[email protected]>
To
[email protected]
Subject
Re: st: RE: merging aggregate and survey data with different state codes
Date
Tue, 20 Nov 2012 21:10:04 -0500
OP contacted me off list saying "thanks but..." and said that only
codes 1 to 50 were shown, whereas he needed codes from 11 to 95.
The example below produces this output, which is clearly a crosswalk
from one ordering 1-51 to Census state codes:
st code n2
Alabama 63 1
Alaska 94 2
Arizona 86 3
Arkansas 71 4
California 93 5
Colorado 84 6
Connecticut 16 7
Delaware 51 8
District of Columbia 53 9
Florida 59 10
Georgia 58 11
Hawaii 95 12
Idaho 82 13
Illinois 33 14
Indiana 32 15
Iowa 42 16
Kansas 47 17
Kentucky 61 18
Louisiana 72 19
Maine 11 20
Maryland 52 21
Massachusetts 14 22
Michigan 34 23
Minnesota 41 24
Mississippi 64 25
Missouri 43 26
Montana 81 27
Nebraska 46 28
Nevada 88 29
New Hampshire 12 30
New Jersey 22 31
New Mexico 85 32
New York 21 33
North Carolina 56 34
North Dakota 44 35
Ohio 31 36
Oklahoma 73 37
Oregon 92 38
Pennsylvania 23 39
Rhode Island 15 40
South Carolina 57 41
South Dakota 45 42
Tennessee 62 43
Texas 74 44
Utah 87 45
Vermont 13 46
Virginia 54 47
Washington 91 48
West Virginia 55 49
Wisconsin 35 50
Wyoming 83 51
Normally I ignore any off-list traffic, but can anyone confirm or deny
any problem with this generated crosswalk?
Or any problem with the general approach?
OP says states are 1 to 50, so perhaps DC should be deleted from the
list, but that is easy to do:
ssc inst ddf2dct, replace
clear
tempfile do dct
loc d http://www.nber.org/cps/cpsmar92.ddf
ddf2dct using `d', dct(`dct') do(`do')
run `do'
drop _all
set obs 95
g cps=_n
la val cps hg_st60
decode cps, gen(st)
keep if !mi(st)
replace st=trim(st)
g code=cps
drop if code==53
sort st
g n2=_n
keep st code n2
list, noo clean
On Tue, Nov 20, 2012 at 3:51 PM, Austin Nichols <[email protected]> wrote:
> Eric M. Uslaner <[email protected]>:
> You should be able to find a crosswalk, or you can make one yourself
> from data that has states defined using those Census codes. Try e.g.
>
> ssc inst ddf2dct, replace
> clear
> tempfile do dct
> loc d http://www.nber.org/cps/cpsmar92.ddf
> ddf2dct using `d', dct(`dct') do(`do')
> run `do'
> drop _all
> set obs 95
> g cps=_n
> la val cps hg_st60
> decode cps, gen(st)
> keep if !mi(st)
> replace st=trim(st)
> g code=cps
> sort st
> g n2=_n
> keep st code n2
> list, noo clean
>
>
> On Tue, Nov 20, 2012 at 3:05 PM, Eric M. Uslaner <[email protected]> wrote:
>> Greetings,
>>
>> I have two data sets and want to merge the aggregate data set into the survey data set. Ultimately I want to merge the two data sets by Congressional district (which is how the aggregate data set is organized). But the major problem is that the two data sets use different codes for states. For the aggregate data, the states are numbered from 1-50 alphabetically. But for the survey data the state codes are:
>>
>> tabl C3_PPSTATEN
>>
>> C3_PPSTATEN (State) code | freq
>> ------------------------------+--------
>> me 11 | 4
>> nh 12 | 3
>> vt 13 | 4
>> ma 14 | 34
>> ri 15 | 2
>> ct 16 | 20
>> ny 21 | 76
>> nj 22 | 42
>> pa 23 | 60
>> oh 31 | 56
>> in 32 | 24
>> il 33 | 49
>> mi 34 | 36
>> wi 35 | 23
>> mn 41 | 35
>> ia 42 | 18
>> mo 43 | 31
>> nd 44 | 1
>> sd 45 | 8
>> ne 46 | 12
>> ks 47 | 14
>> de 51 | 5
>> md 52 | 33
>> dc 53 | 4
>> va 54 | 29
>> wv 55 | 13
>> nc 56 | 41
>> sc 57 | 18
>> ga 58 | 41
>> fl 59 | 89
>> ky 61 | 16
>> tn 62 | 29
>> al 63 | 19
>> ms 64 | 8
>> ar 71 | 12
>> la 72 | 15
>> ok 73 | 15
>> tx 74 | 92
>> mt 81 | 6
>> id 82 | 3
>> co 84 | 17
>> nm 85 | 7
>> az 86 | 33
>> ut 87 | 10
>> nv 88 | 20
>> wa 91 | 34
>> or 92 | 20
>> ca 93 | 127
>> ak 94 | 1
>> hi 95 | 6
>> ------------------------------+--------
>> Total | 1315
>>
>> Clearly there is no one-to-one function I can think of that would transform the state numbers in the aggregate data set. I could, of course, create a new variable step by step:
>>
>> gen int statenew=.
>> replace statenew= 1 if statenum == 1 (where statenum is the numeric equivalent of state; or alternatively "if state=="AR")
>> Etc. through replace statenew=55 if statenum == 49 (statenum = 50 is WY and there are no respondents in the survey from WY)
>>
>> This seems rather cumbersome. If necessary I'll do it. Does anyone have any easier way of doing this?
>>
>> Thanks much,
>>
>> Ric Uslaner
>> [email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/