Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Unanticipated behavior of -encode-
From
"Eric A. Booth" <[email protected]>
To
[email protected]
Subject
Re: st: Unanticipated behavior of -encode-
Date
Mon, 19 Aug 2013 23:15:18 -0500
<>
It's because the automatic label name created by the -encode- is
"temp" in both cases. So, the second time through the loop -encode-
adds more categories to your already defined label "temp". In your
loop, add the command -label list- to see this in action.
To prevent this, add the command -label drop temp- to the end of
your loop or take advantage of the 'label()' option for -encode- to
create a custom label name for each encode (e.g. add:
"label(label`v')" to your extant -encode- command) in your loop.
- Eric
On Mon, Aug 19, 2013 at 10:18 PM, Lacy,Michael
<[email protected]> wrote:
> Under certain circumstances, -encode- will number the numeric version of a string variable starting where it left off at the last encode, rather
> than starting at 1. I encountered this while encoding a varlist of string variables in a large file, which gave me oddities such
> a string variable with the values "male" and "female" being encoded with large consecutive numbers rather than with 1 and 2.
> This is hardly tragic, but it is inconvenient, and not behavior I could anticipate from the documentation of -encode-.
>
> Here's an example of code showing a mild version of this:
>
> clear
> version 13
> set seed 23456
> set obs 4
> gen str x = cond(runiform() > 0.5, "this", "that")
> gen str y = cond(runiform() > 0.5, "blue", "green ")
> //
> foreach v of varlist x y {
> encode `v', gen(temp)
> drop `v'
> rename temp `v'
> }
> tab1 x y, nolab
> //
> -> tabulation of x
>
> x | Freq. Percent Cum.
> ------------+-----------------------------------
> 1 | 2 50.00 50.00
> 2 | 2 50.00 100.00
> ------------+-----------------------------------
> Total | 4 100.00
>
> -> tabulation of y
>
> y | Freq. Percent Cum.
> ------------+-----------------------------------
> 3 | 3 75.00 75.00
> 4 | 1 25.00 100.00
> ------------+-----------------------------------
> Total | 4 100.00
>
>
> I would expect both x and y to be encoded with 1 and 2. This oddity can be avoided by not using "temp" repeatedly, but I'm curious if others can explain why this
> occurs
>
> Regards,
>
>
> Mike Lacy
> Dept. of Sociology
> Colorado State University
> Fort Collins CO 80523-1784
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/