Lee Sieswerda
> I have a data management problem (WinNT4, Stata v7).
>
> I have data from a questionnaire where some of the questions allow the
> respondent to choose multiple responses. Lets say there 7
> possible responses
> and they could choose any number of them. I would code this as a set of 7
> binary variables. Unfortunately, the way it was coded was not so
> straightforward. It was coded across 7 variables, but the responses were
> simply entered in the order in which they were given by the respondent. So
> the data look like this:
>
> f4m1 f4m2 f4m3 f4m4 f4m5 f4m6 f4m7
> 1 7 4 . . . .
> 1 . . . . . .
> 1 . . . . . .
> 1 . . . . . .
> 7 3 . . . . .
> 1 . . . . . .
> 1 2 3 4 . . .
> 1 2 3 4 6 . .
> 1 2 7 . . . .
> 1 . . . . . .
>
> As you can see, you cannot simply tabulate the number of people who
> responded 1, 2 , 3 etc because the responses are scattered over the 7
> variables in a different order for every person. The folks who provided me
> with this data use SPSS and they get around this problem by using
> "multiple
> responses sets". In SPSS, you can define a set of variables as a multiple
> response set (in this case, seven variables) and then ask for tables of
> frequencies and crosstabs generated from across the 7 variables. It works,
> but I'd much rather use Stata than SPSS. Also, the SPSS solution
> is limited
> to simple tables and doesn't permit you to get chi-square or other
> statistics.
>
> Now, in Stata I know I can generate dummy variables from this mess like
> this:
> gen dum1 = 0
> replace dum1 = 1 if f4m1==1 | f4m2==1 | f4m3==1 etc.
> replace dum1 = . if f4m1==. & f4m2==. & f4m3==. etc.
> gen dum2 = 0
> etc.
>
> However, this is tedious in the extreme and there are many of
> these multiple
> response questions in the dataset. I could automate the procedure somewhat
> using -foreach-, but its still more brute force than elegance.
> Someone told
> me about a SAS solution to this problem using an array procedure. Does
> anyone have a nice elegant Stata solution to this problem?
Is this what you want?
1. Use -tabm- from -tab_chi- on SSC.
Advantage: You keep the same data structure.
Disadvantage: Ugly table.
. l
f4m1 f4m2 f4m3 f4m4 f4m5 f4m6
f4m7
1. 1 7 4 . . .
.
2. 1 . . . . .
.
3. 1 . . . . .
.
4. 1 . . . . .
.
5. 7 3 . . . .
.
6. 1 . . . . .
.
7. 1 2 3 4 . .
.
8. 1 2 3 4 6 .
.
9. 1 2 7 . . .
.
10. 1 . . . . .
.
. tabm f4m?
| Values
Variable | 1 2 3 4 6 |
Total
-----------+-------------------------------------------------------+--------
--
f4m1 | 9 0 0 0 0 |
10
f4m2 | 0 3 1 0 0 |
5
f4m3 | 0 0 2 1 0 |
4
f4m4 | 0 0 0 2 0 |
2
f4m5 | 0 0 0 0 1 |
1
-----------+-------------------------------------------------------+--------
--
Total | 9 3 3 3 1 |
22
| Values
Variable | 7 | Total
-----------+-----------+----------
f4m1 | 1 | 10
f4m2 | 1 | 5
f4m3 | 1 | 4
f4m4 | 0 | 2
f4m5 | 0 | 1
-----------+-----------+----------
Total | 3 | 22
. tabm f4m? , trans
| Variable
Values | f4m1 f4m2 f4m3 f4m4 f4m5 |
Total
-----------+-------------------------------------------------------+--------
--
1 | 9 0 0 0 0 |
9
2 | 0 3 0 0 0 |
3
3 | 0 1 2 0 0 |
3
4 | 0 0 1 2 0 |
3
6 | 0 0 0 0 1 |
1
7 | 1 1 1 0 0 |
3
-----------+-------------------------------------------------------+--------
--
Total | 10 5 4 2 1 |
22
Or 2. You -reshape- to long.
Advantage: Much nicer tables, more control.
Disadvantage: Different data structure.
. gen id = _n
. reshape long f4m , i(id)
. table f4m _j
----------------------------------------
| _j
f4m | 1 2 3 4 5
----------+-----------------------------
1 | 9
2 | 3
3 | 1 2
4 | 1 2
6 | 1
7 | 1 1 1
----------------------------------------
. table _j f4m
----------------------------------------------
| f4m
_j | 1 2 3 4 6 7
----------+-----------------------------------
1 | 9 1
2 | 3 1 1
3 | 2 1 1
4 | 2
5 | 1
----------------------------------------------
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/