Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: encode


From   Phil Schumm <[email protected]>
To   [email protected]
Subject   Re: st: encode
Date   Fri, 21 Jan 2005 17:18:06 -0600

At 5:47 PM -0500 1/21/05, Sarah Mustillo wrote:
I'm recoding a substantial number of text responses into categorical variables. I'm finding it easier to -encode- the variables with the text responses first, before replacing the categorical variables with the correct value - this way I can avoid typing out all the text responses in the -replace- command and just type their encoded numbers. I have done this for 6 variables, and it worked fine for 5 of them. I cannot figure out what went wrong with the 6th.

The variable I am trying to encode has about 90 categories. When I encode though, the resulting variable I generate begins at number 8 and ends at 238. The first category (text response) gets an 8, the second gets a 12, and so forth. The manual states that -encode- alphabetizes before it encodes, but that doesn't explain my problem. I would still expect the numbers to go sequentially, which they have with the other 5 variables.

Sara,

One possibility is that there is a pre-existing value label with the same name as the target variable you are encoding to and which already contains some of the same values that are in the variable you are trying to encode (your description of what you are doing suggests that this may have been the case). For example:


. input str1 y

y
1. a
2. b
3. c
4. end

. encode y, gen(target)

. lab li target
target:
1 a
2 b
3 c

. drop target


(Note that although I have dropped the variable target, the corresponding value label still exists.)


. input str1 x

x
1. c
2. d
3. e

. encode x, gen(target)

. tab target, nol

target | Freq. Percent Cum.
------------+-----------------------------------
3 | 1 33.33 33.33
4 | 1 33.33 66.67
5 | 1 33.33 100.00
------------+-----------------------------------
Total | 3 100.00


As you can see, this produces the same result you observed. The reason is that the pre-existing value label target was used by -encode-, with new values being added to accommodate the values in x that were not already there:


. lab li target
target:
1 a
2 b
3 c
4 d
5 e


It is easy to determine if this is what happened -- just look at the value label attached to your target variable. And if this is what happened, then the fix is simple. Just encode to a different target, or use -label drop- first.


-- Phil
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index