Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Creating dummy variables
From
Michael Betz <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: Creating dummy variables
Date
Wed, 16 Nov 2011 15:52:43 +0000
Thanks Matt,
This is getting close but there is still a hang-up. The program you wrote differences all "fips1" dummies with all "fips2" dummies. I need to get difference dummies only for the pairs (i.e 1001-1073, 1001-1021, and 1001-1101, but not 1001-12031). Because I have 3,000 levels for each "fips" variable, this program would create 3,000 x 3,000 variables, which is where Stata runs into a problem.
Mike
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Matthew White
Sent: Wednesday, November 16, 2011 9:14 AM
To: [email protected]
Subject: Re: st: Creating dummy variables
Hi Mike,
There's probably a more efficient way to do this, but here's one way:
***BEGIN***
clear
input fips1 fips2
1001 1073
1001 1021
1001 1101
1003 12031
1003 1099
end
forvalues i = 1/2 {
levelsof fips`i'
foreach county in `r(levels)' {
generate fips`i'_`county' = fips`i' == `county'
label variable fips`i'_`county' "fips`i'==`county'"
local dummies`i' `dummies`i'' fips`i'_`county'
}
}
foreach dummy1 of local dummies1 {
local num1 = substr("`dummy1'", strpos("`dummy1'", "_") + 1, .)
foreach dummy2 of local dummies2 {
local num2 = substr("`dummy2'", strpos("`dummy2'", "_") + 1, .)
generate diff_`num1'_`num2' = `dummy1' - `dummy2'
label variable diff_`num1'_`num2' "`dummy1' - `dummy2'"
}
}
***END***
Best,
Matt
On Tue, Nov 15, 2011 at 9:44 PM, Michael Betz
<[email protected]> wrote:
> Hi all,
>
> I have two categorical variables "fips1" and "fips2" that record the US county of the observation. For each "fips1" there are many "fips2" counties as below
>
> fips1 fips2
> 1001 1073
> 1001 1021
> 1001 1101
> 1003 12031
> 1003 1099
>
> I need to create dummy variables for each county in "fips1" and "fips2" and then create variables representing the difference between the two dummy variables as below:
>
> fips1 fips2 dum1_1 dum1_2 dum2_1 dum2_2 dum2_3 dum2_4 1_1-2_1 1_1-2_2 1_1-2_3
> 1001 1003 1 0 1 0 0 0 0 1 1
> 1001 1021 1 0 0 1 0 0 1 0 1
> 1001 1101 1 0 0 0 1 0 1 1 0
> 1003 1021 0 1 0 1 0 0 0 0 0
> 1003 1001 0 1 0 0 0 1 0 0 0
>
> One added constraint is that each of "fips1" and "fips2" creates 3,000 dummies, so Stata cannot hold variables representing the difference between all pairs of dummy variables. I need to only calculate the difference in dummies for the pairs that in the data (i.e. according to the example above I would not need the difference between the dummies for "fips1"=1001 and "fips2"=1001 because that pair doesn't exist in my data)
>
> I've been thinking all day trying to come up with a solution, but to no avail. I appreciate and help or suggestions.
>
> Thanks,
> Mike
>
>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
Matthew White
Data Coordinator
Innovations for Poverty Action
101 Whitney Avenue, New Haven, CT 06510 USA
+1 434-305-9861
www.poverty-action.org
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/