Thank you Phil and Daniel for responding to my question on a Friday
evening! Phil - you were absolutely right. I tried to drop the label
associated with the variable I was encoding - and that didn't work, so I
thought perhaps that wasn't my problem. But, when I changed the name of
the variable I was encoding, I got the correct results. Thank you,
thank you! I appreciate your help!
Sarah
Phil Schumm wrote:
At 5:47 PM -0500 1/21/05, Sarah Mustillo wrote:
I'm recoding a substantial number of text responses into categorical
variables. I'm finding it easier to -encode- the variables with the
text responses first, before replacing the categorical variables with
the correct value - this way I can avoid typing out all the text
responses in the -replace- command and just type their encoded
numbers. I have done this for 6 variables, and it worked fine for 5
of them. I cannot figure out what went wrong with the 6th.
The variable I am trying to encode has about 90 categories. When I
encode though, the resulting variable I generate begins at number 8
and ends at 238. The first category (text response) gets an 8, the
second gets a 12, and so forth. The manual states that -encode-
alphabetizes before it encodes, but that doesn't explain my problem. I
would still expect the numbers to go sequentially, which they have
with the other 5 variables.
Sara,
One possibility is that there is a pre-existing value label with the
same name as the target variable you are encoding to and which already
contains some of the same values that are in the variable you are trying
to encode (your description of what you are doing suggests that this may
have been the case). For example:
. input str1 y
y
1. a
2. b
3. c
4. end
. encode y, gen(target)
. lab li target
target:
1 a
2 b
3 c
. drop target
(Note that although I have dropped the variable target, the
corresponding value label still exists.)
. input str1 x
x
1. c
2. d
3. e
. encode x, gen(target)
. tab target, nol
target | Freq. Percent Cum.
------------+-----------------------------------
3 | 1 33.33 33.33
4 | 1 33.33 66.67
5 | 1 33.33 100.00
------------+-----------------------------------
Total | 3 100.00
As you can see, this produces the same result you observed. The reason
is that the pre-existing value label target was used by -encode-, with
new values being added to accommodate the values in x that were not
already there:
. lab li target
target:
1 a
2 b
3 c
4 d
5 e
It is easy to determine if this is what happened -- just look at the
value label attached to your target variable. And if this is what
happened, then the fix is simple. Just encode to a different target, or
use -label drop- first.
-- Phil
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
--
Sarah A. Mustillo, Ph.D
Department of Psychiatry and Behavioral Sciences
Duke University School of Medicine
Box 3454
Durham NC 27710
919 687-4686 x231
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/