Sometimes it is best to work out how one would
solve a problem oneself before looking at
why someone's solution doesn't work.
I can foresee lots of difficulties which such a program
should tackle: the value label might not exist,
it might not be suitable as a variable name, the
putative name might already be in use, etc.
The second difficulty is to be seen in your example:
"Native Indian" certainly won't qualify. A work-around
is to try replacing spaces by underscores.
A trial program with some error traps might then be:
------------------------------------ mydummies
*! 1.0.0 NJC 9 Sept 2004
program mydummies, rclass
version 8.2
syntax varname(numeric) [if] [in]
marksample touse
qui count if `touse'
if r(N) == 0 error 2000
local label : value label `varlist'
if "`label'" == "" {
di as err "`varlist' not labelled"
exit 182
}
qui levels `varlist' if `touse', local(levels)
// test the names: exit if problem
foreach l of local levels {
local name : label `label' `l'
local name : subinstr local name " " "_", all
confirm new var `name'
local names "`names' `name'"
}
// generate the variables
local i = 1
qui foreach l of local levels {
local name : word `i++' of `names'
gen `name' = `varlist' == `l' if `touse'
}
return local varlist "`names'"
end
-----------------------
This leaves the question of what's wrong with
your program, apart from the lack of any error
trapping.
I note your assumption that values in your
labelled variable run over the integers 1 up.
This no doubt is fine for your applications, but
lacks generality.
Here is your program again. I have removed the
comments to save space
program define my_dummy
version 8
tempvar max1
egen `max1'=rmax(`1')
tempvar max2
egen `max2'=max(`max1')
local maxval=`max2'
forvalues i = 1/`maxval' {
egen resp`i' = eqany(`1'), v(`i')
}
tokenize `1'
local j = 1
forvalues i = 1/`maxval' {
local labval`j' : label `1' `i'
local j = `j' + 1
}
local i 1
local j 1
while `i' == `j' & `i' <= `maxval' {
rename resp`i' `labval`j''
local i = `i' + 1
local j = `j' + 1
}
end
The first thing to note is the lack of any
-syntax- statement. That is not illegal, but
it means that, in homespun terms, your
door is wide open and anything can walk in.
There seem to be ambitions here of being
able to tackle several variables at once;
I'd rather solve the case of one variable
first, knowing that I can always loop over
variables with -foreach-.
Then you use -egen- to generate a variable
to hold the maximum of the variable supplied.
You can do that do that directly with -summarize-
and avoid the extra variable. Similarly,
-egen, eqany()- is an awkward beast which you
don't need for getting a dummy when -generate-
will do it directly, and much faster.
Also, you are assuming
that there are no variables in the dataset
called resp1, resp2, etc. That strictly
calls for temporary variables.
Putting those together, your program becomes:
program define my_dummy
version 8
// cleaned up a bit from here on
syntax varname(numeric)
su `varlist', meanonly
local maxval = r(max)
forvalues i = 1/`maxval' {
tempvar dummy
gen `dummy' = `varlist' == `i'
local dummies "`dummies' `dummy'"
}
// not yet touched
tokenize `1'
local j = 1
forvalues i = 1/`maxval' {
local labval`j' : label `1' `i'
local j = `j' + 1
}
local i 1
local j 1
while `i' == `j' & `i' <= `maxval' {
rename resp`i' `labval`j''
local i = `i' + 1
local j = `j' + 1
}
end
Turning now to the remainder, "not yet touched",
I see more loops than seem necessary. The code seems to boil down
to
forvalues i = 1/`maxval' {
local labval : label `varlist' `i'
local dummy : word `i' of `dummies'
rename `dummy' `labval'
}
I can't however see why you get the bizarre one-letter
names. Perhaps someone else can illuminate.
Nick
[email protected]
Lim, Nelson
> I am trying to create dummies variables from a categorical
> variable and
> want to have value labels of the categorical variable to be
> the names of
> the dummy variables.
>
> For example, I have a variable called race_n:
>
> Numeric |
> version of |
> race | Freq. Percent Cum.
> --------------+-----------------------------------
> Asian | 1,692 3.19 3.19
> White | 41,311 77.90 81.09
> Hispanic | 2,237 4.22 85.30
> Black | 6,770 12.77 98.07
> Native Indian | 272 0.51 98.58
> Other | 752 1.42 100.00
> --------------+-----------------------------------
> Total | 53,034 100.00
>
> I want to create 6 dummies whose names are the value labels of race_n.
> For example, I would like to have the first dummy variable to
> be called Asian.
>
> I wrote a program called my_dummy. It seems to work, but when
> I describe
> the data, I get the following. The dummies only take the
> first letter of the variable.
>
> . describe
>
> --------------------------------------------------------------
> ----------
> ---- storage display value
> variable name type format label variable label
> --------------------------------------------------------------
> ----------
> ----
> A byte %8.0g race_n == 1
> C byte %8.0g race_n == 2
> H byte %8.0g race_n == 3
> N byte %8.0g race_n == 4
> T byte %8.0g race_n == 5
> X byte %8.0g race_n == 6
> --------------------------------------------------------------
> ----------
> ----
>
>
> /* beginning of the program */
> program define my_dummy
>
> version 8
>
> /* computing the maximum value of the variable */
> tempvar max1
> egen `max1'=rmax(`1')
> tempvar max2
> egen `max2'=max(`max1')
> local maxval=`max2'
>
> /* generating the set of dummy variables */
> forvalues i = 1/`maxval' {
> egen resp`i' = eqany(`1'), v(`i')
> }
>
>
> /* naming the value labels of the original variable */
> */ to the dummy variables
>
> tokenize `1'
> local j = 1
> forvalues i = 1/`maxval' {
> local labval`j' : label `1' `i'
> local j = `j' + 1
> }
>
> local i 1
> local j 1
> while `i' == `j' & `i' <= `maxval' {
> rename resp`i' `labval`j''
> local i = `i' + 1
> local j = `j' + 1
> }
>
>
> end
>
> my_dummy race_n
>
>
> . describe
>
> --------------------------------------------------------------
> ----------
> ---- storage display value
> variable name type format label variable label
> --------------------------------------------------------------
> ----------
> ----
> A byte %8.0g race_n == 1
> C byte %8.0g race_n == 2
> H byte %8.0g race_n == 3
> N byte %8.0g race_n == 4
> T byte %8.0g race_n == 5
> X byte %8.0g race_n == 6
> --------------------------------------------------------------
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/