Jens M. Lauritsen
>
> Does anyone wish to cooperate on generating a "multiple
> response " table command ?
>
> Example:
> Diagnoses are recorded in up to three variables d1 d2 d3
>
> To know how many people has a certain diagnose one has to
> combine the three
> diagnoses into a combined classification.
>
> If diagnoses are recorded as strings another problem arises
> since with
> encode the same diagnose gets a different code in the
> different variables,
> e.g. :
>
> clist d* if NR1== "s 610" | NR2 == "s 610",label
>
> d_miss d1 d2 d3 diag_miss
> diag_r1 diag_r2
> diag_r3
> 26. 0 s 610 . . 0
> 0 0
> 0
> 27. 0 s 092 s 610 . 0
> 0 0
> 0
>
> * but the codes are not equal, notice 96 vs 41:
> clist d* if NR1== "s 610" | NR2 == "s 610",
>
> d_miss d1 d2 d3 diag_miss diag_r1 diag_r2 diag_r3
> 26. 0 96 . . 0 0 0 0
> 27. 0 41 41 . 0 0 0 0
>
> We wish to make a table like:
>
> multresponse d1-d3
>
> Number
> Percentage
> responses observations responses
> observations
> diagnose 1 x x1
> x2
> x3
> diagnose 2 etc
> etc.
> diagnose n
>
Ulrich Kohler and I have drafted a FAQ on
handling multiple responses in Stata. (Lee
Sieswerda made many useful comments on
a draft.)
It's quite long. I imagine that we will submit it to
Stata Corp this week for their consideration. Of
course, I don't know what they will think of it.
My main line here is that the problem of tabulation
of multiple responses is rather too large and vague to
be tackled by a single program. Although they may seem
to purport otherwise, previous programs in this area have
each tackled only part of the problem.
Because of wrap-around problems with Jens' posting,
I don't get a clear picture of what he wants. But
one strategy I know is to generate what you want to show
-- typically with -egen- or something equivalent --
and then use an existing tabulation command to show that.
In our draft FAQ, Ulrich and I list the Stata programs
we know about, in addition to those in official Stata:
-tabcond- (SSC; Stata 7) Tabulates frequencies satisfying
up to 5 specified conditions. Zero frequencies are shown
explicitly.
-tabm- (SSC as part of tab_chi; Stata 7) Tabulates two or more
comparable variables, in a combined two-way table of variables
by values. Either all variables should be numeric, or all
variables should be string.
-tabsplit- (SSC as part of tab_chi; Stata 6) Tabulates
frequencies of occurrence of the parts of a string variable.
By default, the parts of a string are separated by spaces.
Optionally, alternative punctuation characters may be specified.
-tabw- (STB-25; Stata 3.1). For each variable in a list,
tabulates the number of times it takes on the values 0, 1,
..., 9; the number of times it is missing; and the number
of times it is equal to some other value. String variables
are not tabulated but are identified at the end of the
displayed table.
In Stata 8, -list- is much enhanced and is
competitive with what are more overtly tabulation
commands for problems in this area.
Nick
[email protected]
P.S. Jens' specific problem that -encode- is not
working consistently may best be dealt with upstream.
One way is to use the -label()- option. Another way
is to -reshape- to long and -encode- what is now
a single string variable, then -reshape- back. However,
this all presupposes that you _must_ -encode- before
tabulation, which isn't obvious to me.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/