Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: FW: Cleanup of messy variable

From   "Honey, Wayne, DOH" <[email protected]>
To   <[email protected]>
Subject   st: FW: Cleanup of messy variable
Date   Thu, 19 Oct 2006 11:03:56 -0600


We have a data set with a poorly designed string variable of the form str%22s.� This variable allowed for multiple responses to be coded in the following manner: 

01.� Cards�(21, Black Jack, Poker, etc.)
02.� Animals (Roosters, dogs, horses, frogs, ducks)
03.� Sports (football, baseball, pool, golf)(incl. pools, w/friends or bookie)
04.� Dice games of any type (Craps, etc.)
05.  Lottery or numbers (Quick Pick, Road Runner, scratch cards, etc.)
06.  Bingo
07.� Raffles or sweepstakes
08.� Slot machines, video machines or other gambling machines
09.� Pull Tabs, punch cards 
10.� Internet Gambling
11.� Other, please specify: ______________________________� SAM (575-594)

88.� Never Gamble� GO TO NEXT MODULE
98.� No other
77.� Don't Know/Not Sure
99.� Refused� GO TO NEXT MODULE

The respondent was free to respond in any way they chose and the interviewers were trained to select from among 15 possible response codes.� Codes 01 through 10 were assigned to particular forms of gambling.� Code 11 was used to identify types of gambling that couldn't be coded according to the 10 identified responses.� 
Codes 77, 88, and 99 are self-explanatory.� If the respondent reported one or more types of gambling, the interviewer coded as many forms as were relevant, then entered 98 to indicate that no additional types of gambling were reported.� 

Consequently, we have a variable with a wide variety of responses (see frequency table, below, showing the first and last few rows).

	1 2 3 4 5 7 8 998     |          1        0.03        7.19
	1 2 3 4 5 898         |          1        0.03        7.22
	1 2 3 51098           |          1        0.03        7.25
	1 2 4 5 7 898         |          1        0.03        7.28
	1 2 498               |          1        0.03        7.31
	1 2 81098             |          1        0.03        7.34
	1 2 898               |          1        0.03        7.37
	1 298                 |          7        0.21        7.58
	1 3 898               |          1        0.03        7.61
	1 398                 |          3        0.09        7.70
	1 4 5 898             |          1        0.03        7.73
	1 4 598               |          2        0.06        7.79
	1 4 8 9 5 798         |          1        0.03        7.82
	1 4 898               |          1        0.03        7.85
	1 498                 |          3        0.09        7.94
	1 5 2 798             |          1        0.03        7.97
	50 85998              |          1        0.03       40.16
	5898                  |          1        0.03       40.19
	77                    |          1        0.03       40.22
	                   88 |          1        0.03       40.25
	88                    |      1,974       59.39       99.64
	89 898                |          1        0.03       99.67
	99                    |         11        0.33      100.00

Ultimately, we would like to summarize the results in a few simple ways:
1. Proportion of adults participating in gambling of any form
2. Proportion of adults participating in Internet gambling (as a new form that should be monitored)
3. Most common form of gambling
4. 3 most common forms of gambling

Clearly, the structure of the variable does not lend itself to efficient use.� Note that, in addition to the problem of multiple responses stored in a single variable, spacing does not appear to be consistent and some records even have a right justification while most appear to be left justified within the 22 columns.� I don't know if this justification is real or only apparent.

Any advice on how to work with this variable using Stata 9.2 (generate other variables summarizing responses, etc.) would be greatly appreciated.


Wayne A. Honey, MPH
Survey Epidemiologist
[email protected]
(505) 476-3595 Voice
(505) 827-0013 FAX
New Mexico Department of Health
Epidemiology & Response Division
Injury & Behavioral Epidemiology Bureau
Survey Unit
1190 St. Francis Dr., Suite N-1350
P.O. Box 26110
Santa Fe, NM� 87502-6110

Confidentiality Notice: This e-mail, including all attachments is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited unless specifically provided under the New Mexico Inspection of Public Records Act. If you are not the intended recipient, please contact the sender and destroy all copies of this message. -- This email has been scanned by the Sybari - Antigen Email System. 

*   For searches and help try:

© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index