Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: converting multiple choice (string) response options to numeric values |
Date | Fri, 7 Feb 2014 11:30:41 +0000 |
This is quite a common problem, and it's easy to get bitten. label def mylabels 1 "A" 2 "B" 3 "C" 4 "D" 5 "E" foreach v of var <varlist> { encode `v', gen(n_`v') label(mylabels) } is a sketch of how to do it. You must replace <varlist> by an actual varlist. Alternatively, as said, look at -multencode- (SSC). Nick njcoxstata@gmail.com On 7 February 2014 09:22, Nick Cox <njcoxstata@gmail.com> wrote: > Applying -encode- to several variables is a little dangerous. If the > values "A" to "D" occur for every variable and "E" occurs only for > those variables for which it is possible, and for all of them, you > should be fine. But suppose the only answers that occur for one > variable are "A", "C", "D". Then those will be, by default, mapped to > 1,2,3. -encode- has by default no intelligence that spots that "B" is > missing and decides that the appropriate coding is 1, 3, 4. You would > need to define value labels in advance and specify those as the labels > to be used. > > Note also -multencode- (SSC). > > Nick > njcoxstata@gmail.com > > > On 7 February 2014 08:04, Ronnie Babigumira <rb.glists@gmail.com> wrote: >> encode worked just fine. What you see as the "exact same variable" is >> just the label >> >> ***** >> clear * >> input id str1 qn1 str1 strqn3 >> 1 A D >> 2 A A >> 3 E B >> 4 B C >> end >> >> encode qn1, g(nqn1) >> list >> list, nolabel >> ***** >> >> Ps: note the label option of encode which allows you to provide your own label >> >> On Fri, Feb 7, 2014 at 1:59 AM, Katherine Picho <thestatsbabe@gmail.com> wrote: >>> I have a huge dataset which has test data with multiple choice >>> questions. 2 questions have choices A -E, and the rest have 4 options >>> A-D >>> >>> I was looking to convert these options to numeric values with A >>> corresponding to 1, B=2, etc. >>> >>> I'm using stata 12. >>> >>> I tried using the egen newvar= group (oldvar) command, it seems to >>> work for some questions but not others. For instance the sequence of >>> the 1st 5 students' answers for question 18 are AAAAA, which should >>> translate to 5 consecutive 1s..but I get consecutive 2s instead. >>> >>> For another test question 10, a value of 6 is reported for one >>> observation which actually has a letter value of C which should >>> correspond to a value of 3. >>> >>> I also tried encode oldvar, gen (newvar) >>> but I get the exact same variable data as in the original (i.e. >>> letters, not numbers) even though the data storage type now shows >>> 'long' >>> >>> I've checked to make sure there is consistency in data entry and there >>> appears to be; i.e. all responses are entered in capital letters, and >>> there is no mix of numeric and letters in the same variable/ column. >>> >>> What am I doing wrong? Any thoughts on this problem would be highly >>> welcome as I dread the idea of having to manually convert these >>> letters to numbers! * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/