replace myvar = subinstr(myvar, "998", "9 98", .)
Nick
[email protected]
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of Nick Cox
> Sent: 19 October 2006 18:37
> To: [email protected]
> Subject: st: RE: FW: Cleanup of messy variable
>
>
> I think you need to clean up at source.
>
> Some of the problems look fairly clear
> and can be fixed with a -subinstr()-
> function in a -replace-. Some look more
> difficult to diagnose.
>
> For example, "998" as an element looks
> a miscoding for "9 98" and the action would
> then be
>
> replace myvar = subinstr(myvar, "998", "998", .)
>
> Once you have cleaned up, some of your
> questions can be answered using -tabsplit-
> from -tab_chi- on SSC.
>
> Others will requiring a different data structure
> based on a -split- and then a -reshape-.
>
> Nick
> [email protected]
>
> Honey, Wayne, DOH
>
> > We have a data set with a poorly designed string variable of
> > the form str%22s.� This variable allowed for multiple
> > responses to be coded in the following manner:
> >
> > 01.� Cards�(21, Black Jack, Poker, etc.)
> > 02.� Animals (Roosters, dogs, horses, frogs, ducks)
> > 03.� Sports (football, baseball, pool, golf)(incl. pools,
> > w/friends or bookie)
> > 04.� Dice games of any type (Craps, etc.)
> > 05. Lottery or numbers (Quick Pick, Road Runner, scratch
> cards, etc.)
> > 06. Bingo
> > 07.� Raffles or sweepstakes
> > 08.� Slot machines, video machines or other gambling machines
> > 09.� Pull Tabs, punch cards
> > 10.� Internet Gambling
> > 11.� Other, please specify: ______________________________�
> > SAM (575-594)
> >
> > 88.� Never Gamble� GO TO NEXT MODULE
> > 98.� No other
> > 77.� Don't Know/Not Sure
> > 99.� Refused� GO TO NEXT MODULE
> >
> > The respondent was free to respond in any way they chose and
> > the interviewers were trained to select from among 15
> > possible response codes.� Codes 01 through 10 were assigned
> > to particular forms of gambling.� Code 11 was used to
> > identify types of gambling that couldn't be coded according
> > to the 10 identified responses.�
> > Codes 77, 88, and 99 are self-explanatory.� If the respondent
> > reported one or more types of gambling, the interviewer coded
> > as many forms as were relevant, then entered 98 to indicate
> > that no additional types of gambling were reported.�
> >
> > Consequently, we have a variable with a wide variety of
> > responses (see frequency table, below, showing the first and
> > last few rows).
> >
> > 1 2 3 4 5 7 8 998 | 1 0.03 7.19
> > 1 2 3 4 5 898 | 1 0.03 7.22
> > 1 2 3 51098 | 1 0.03 7.25
> > 1 2 4 5 7 898 | 1 0.03 7.28
> > 1 2 498 | 1 0.03 7.31
> > 1 2 81098 | 1 0.03 7.34
> > 1 2 898 | 1 0.03 7.37
> > 1 298 | 7 0.21 7.58
> > 1 3 898 | 1 0.03 7.61
> > 1 398 | 3 0.09 7.70
> > 1 4 5 898 | 1 0.03 7.73
> > 1 4 598 | 2 0.06 7.79
> > 1 4 8 9 5 798 | 1 0.03 7.82
> > 1 4 898 | 1 0.03 7.85
> > 1 498 | 3 0.09 7.94
> > 1 5 2 798 | 1 0.03 7.97
> > 50 85998 | 1 0.03 40.16
> > 5898 | 1 0.03 40.19
> > 77 | 1 0.03 40.22
> > 88 | 1 0.03 40.25
> > 88 | 1,974 59.39 99.64
> > 89 898 | 1 0.03 99.67
> > 99 | 11 0.33 100.00
> >
> >
> > Ultimately, we would like to summarize the results in a few
> > simple ways:
> > 1. Proportion of adults participating in gambling of any form
> > 2. Proportion of adults participating in Internet gambling
> > (as a new form that should be monitored)
> > 3. Most common form of gambling
> > 4. 3 most common forms of gambling
> >
> > Clearly, the structure of the variable does not lend itself
> > to efficient use.� Note that, in addition to the problem of
> > multiple responses stored in a single variable, spacing does
> > not appear to be consistent and some records even have a
> > right justification while most appear to be left justified
> > within the 22 columns.� I don't know if this justification is
> > real or only apparent.
> >
> > Any advice on how to work with this variable using Stata 9.2
> > (generate other variables summarizing responses, etc.) would
> > be greatly appreciated.
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/