Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Unanticipated behavior of -encode-
From
"Lacy,Michael" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: Unanticipated behavior of -encode-
Date
Tue, 20 Aug 2013 03:18:34 +0000
Under certain circumstances, -encode- will number the numeric version of a string variable starting where it left off at the last encode, rather
than starting at 1. I encountered this while encoding a varlist of string variables in a large file, which gave me oddities such
a string variable with the values "male" and "female" being encoded with large consecutive numbers rather than with 1 and 2.
This is hardly tragic, but it is inconvenient, and not behavior I could anticipate from the documentation of -encode-.
Here's an example of code showing a mild version of this:
clear
version 13
set seed 23456
set obs 4
gen str x = cond(runiform() > 0.5, "this", "that")
gen str y = cond(runiform() > 0.5, "blue", "green ")
//
foreach v of varlist x y {
encode `v', gen(temp)
drop `v'
rename temp `v'
}
tab1 x y, nolab
//
-> tabulation of x
x | Freq. Percent Cum.
------------+-----------------------------------
1 | 2 50.00 50.00
2 | 2 50.00 100.00
------------+-----------------------------------
Total | 4 100.00
-> tabulation of y
y | Freq. Percent Cum.
------------+-----------------------------------
3 | 3 75.00 75.00
4 | 1 25.00 100.00
------------+-----------------------------------
Total | 4 100.00
I would expect both x and y to be encoded with 1 and 2. This oddity can be avoided by not using "temp" repeatedly, but I'm curious if others can explain why this
occurs
Regards,
Mike Lacy
Dept. of Sociology
Colorado State University
Fort Collins CO 80523-1784
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/