Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: converting multiple choice (string) response options to numeric values
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: converting multiple choice (string) response options to numeric values
Date
Fri, 7 Feb 2014 09:22:40 +0000
Applying -encode- to several variables is a little dangerous. If the
values "A" to "D" occur for every variable and "E" occurs only for
those variables for which it is possible, and for all of them, you
should be fine. But suppose the only answers that occur for one
variable are "A", "C", "D". Then those will be, by default, mapped to
1,2,3. -encode- has by default no intelligence that spots that "B" is
missing and decides that the appropriate coding is 1, 3, 4. You would
need to define value labels in advance and specify those as the labels
to be used.
Note also -multencode- (SSC).
Nick
[email protected]
On 7 February 2014 08:04, Ronnie Babigumira <[email protected]> wrote:
> encode worked just fine. What you see as the "exact same variable" is
> just the label
>
> *****
> clear *
> input id str1 qn1 str1 strqn3
> 1 A D
> 2 A A
> 3 E B
> 4 B C
> end
>
> encode qn1, g(nqn1)
> list
> list, nolabel
> *****
>
> Ps: note the label option of encode which allows you to provide your own label
>
> On Fri, Feb 7, 2014 at 1:59 AM, Katherine Picho <[email protected]> wrote:
>> I have a huge dataset which has test data with multiple choice
>> questions. 2 questions have choices A -E, and the rest have 4 options
>> A-D
>>
>> I was looking to convert these options to numeric values with A
>> corresponding to 1, B=2, etc.
>>
>> I'm using stata 12.
>>
>> I tried using the egen newvar= group (oldvar) command, it seems to
>> work for some questions but not others. For instance the sequence of
>> the 1st 5 students' answers for question 18 are AAAAA, which should
>> translate to 5 consecutive 1s..but I get consecutive 2s instead.
>>
>> For another test question 10, a value of 6 is reported for one
>> observation which actually has a letter value of C which should
>> correspond to a value of 3.
>>
>> I also tried encode oldvar, gen (newvar)
>> but I get the exact same variable data as in the original (i.e.
>> letters, not numbers) even though the data storage type now shows
>> 'long'
>>
>> I've checked to make sure there is consistency in data entry and there
>> appears to be; i.e. all responses are entered in capital letters, and
>> there is no mix of numeric and letters in the same variable/ column.
>>
>> What am I doing wrong? Any thoughts on this problem would be highly
>> welcome as I dread the idea of having to manually convert these
>> letters to numbers!
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/