Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: "table" showing summary of ~100 ternary variables?

From	Nick Cox <[email protected]>
To	"'[email protected]'" <[email protected]>
Subject	RE: st: "table" showing summary of ~100 ternary variables?
Date	Sun, 31 Oct 2010 19:11:16 +0000

As Phil surmises, there are various user-written commands (but not functions; outside of Mata, users may not write functions; in any case what Michael is asking for can only be done within Stata by a command) which are alternatives for at least some of this problem. See

SJ-5-1  st0082  . . . . . . . . . . . . . . . Tabulation of multiple responses
        (help _mrsvmat, mrgraph, mrtab if installed)  . . . . . . . .  B. Jann
        Q1/05   SJ 5(1):92--122
        introduces new commands for the computation of one- and
        two-way tables of multiple responses

and -tabm- from the -tab_chi- package on SSC. -tabm- was the subject of a thread earlier this month (October), so see the Statalist archives for some discussion. 

Given Phil's fake dataset 

tabm y*

shows the frequencies of y1-y5 on the categories defined and -tabulate- options for a two-way table may be added. Ben's alternatives are documented in detail in his paper, so I'll not repeat that here. 

Nick 
[email protected] 

Phil Schumm

I expect that there is probably a user-written program to do this, but  
it's not too difficult to do from first principles.  We'll start by  
generating a dataset similar to what you've described:


     set obs 100
     gen id = _n
     lab def mylab 1 "Correct" 2 "Incorrect" 3 "No response" 4 "Missing"
     set seed 123456789
     forv i = 1/5 {
         gen byte y`i' = cond(runiform()<0.95, ceil(runiform()*3), 4)
         lab val y`i' mylab
     }
     replace y5 = 2 if y5==1


This generates 5 variables y1-y5, each taking values 1, 2, 3 or 4.   
You'll notice I've used 4 for the "missing" values here, only because  
that'll give you more flexibility for where the corresponding column  
appears in the final table (i.e., if we left missing values as ".", we  
wouldn't be able to place a summary column after that one).  As you  
can see, each variable takes values 1-3 with probability 1/3 each, and  
is missing in 5% of cases.  Note that I've also modified y5 so that it  
doesn't contain any correct responses, because you want to make sure  
your code can handle such cases.

Now, the first trick is to reshape your data into long form:


     reshape long y, i(id) j(Var)


Note that we could have used -stack- here instead, and in fact, that  
would have been more convenient if our variables weren't named  
systematically as they are here.  Since we want to add a column for  
the proportion of correct responses, we'll add a corresponding  
observation for each variable whose values we'll fill in later (if you  
wanted to add multiple summary columns, you could add additional  
observations here):


     set obs `=c(N) + 1'
     replace y = 5 if _n == _N
     lab def mylab 5 "Prop. correct", add


Next, we'll generate our cell counts by using -collapse-, but first  
we'll use -fillin- to make sure that all of our cells are represented  
(even if their observed counts are zero):


     fillin Var y
     collapse (count) cnt=id, by(Var y)


Note that we are also using -fillin- here to propagate the observation  
we added above to hold the proportion of correct responses across all  
of the variables.

Now, we'll compute the proportion correct (as a proportion of values  
1-3) for each variable:


     egen correct = max((y==1)*cnt), by(Var)
     egen nonmiss = sum(inlist(y,1,2,3)*cnt), by(Var)
	replace cnt = correct / nonmiss if y == 5

And finally, we can use -tabdisp- to create our table:


     . tabdisp Var y if !mi(Var), c(cnt) format(%9.2g)

      
--------------------------------------------------------------------
          |                             y
      Var |  Correct    Incorrect   No response    Missing  Prop.  
correct
     ----- 
+--------------------------------------------------------------
        1 |       34           37            25          4            . 
35
        2 |       26           35            33          6            . 
28
        3 |       25           32            40          3            . 
26
        4 |       32           27            35          6            . 
34
        5 |        0           62            34           
4              0
      
--------------------------------------------------------------------


where I've used my text editor to narrow some of the columns so that  
the table doesn't get wrapped by people's mailers.  Of course, there  
are several ways we might embellish this -- this merely illustrates  
one possible strategy for achieving the desired result.

Michael Costello 

> I have about 100 ternary variables (0=incorrect, 1=correct, 2=No  
> Response, .=Missing) and I would like to get a table of the  
> responses that looks like this:
>
> -------------------------------------------------------
> |         |     Correct   Incorrect     No Response   .
> ----------+--------------------------------------------
>      Var1 |          2          21             4     21
>      Var2 |          8          19            13     18
>      Var3 |         30          19             4     19
>      Var4 |         18          21            47     22
>      Var5 |         11          27             8     30
>
> Maybe I'd even like to add in a proportion of correct or ratio of  
> correct to incorrect into the table.
>
> Is there a function to do this or something reasonable similar?


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: "table" showing summary of ~100 ternary variables?
  - From: Michael Costello <[email protected]>
- Re: st: "table" showing summary of ~100 ternary variables?
  - From: Phil Schumm <[email protected]>

Prev by Date: st: NLSUR starting values invalid or some RHS variables have missing values
Next by Date: st: Marginal Effects using subsetted data
Previous by thread: Re: st: "table" showing summary of ~100 ternary variables?
Next by thread: st: Data management: lowest value for which some condition is met
Index(es):
- Date
- Thread