|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Re: text manipulation of tabulate output
On Fri, 18 Apr 2008, Jacob Wegelin wrote:
Suppose you have a list of categorical (qualitative) variables that are in
your data; each variable has some arbitrary number of categories; and you
want to produce a report, in text,
- with one row for each variable and
- a list of the percents in each category for each variable.
The code below produces the following display for a set of variables:
Paired_Biopsy_: number of categories=2; total nonmissing= 250; 81.2%, 18.8%
ALTcode: number of categories=3; total nonmissing= 250; 41.2%, 56%, 2.8%
Alcohol: number of categories=2; total nonmissing= 250; 75.2%, 24.8%
CDC_class: number of categories=8; total nonmissing= 161; 26.1%, 23.6%,
11.2%, 8.7%, 4.3%, .6%, 1.9%, 23.6%
BEGIN CODE
local QualitVars ///
Paired_Biopsy_ ///
ALTcode ///
Alcohol ///
CDC_class ///
display "`QualitVars'"
tabulate CDC_class
generate DUMMYjunk=0
foreach THISVAR of varlist `QualitVars' ///
{
display " "
display "`THISVAR'" ": " _continue
drop DUMMY*
quietly: tabulate `THISVAR', generate (DUMMY)
scalar nCategories=r(r)
scalar denominator=r(N)
display "number of categories=" nCategories "; total nonmissing= "
denominator "; " _continue
local index=0
while `index' < nCategories {
local index=`index' + 1
quietly: summarize DUMMY`index'
scalar thispercent= round( 100* r(sum)/denominator, 0.1)
display thispercent "%" _continue
if `index' < nCategories {
display ", " _continue
}
}
}
END CODE
Question Number One: Am I reinventing the wheel? Is there an easier way to do
this?
Question Number Two: Is there a way to get the labels for the categories for
each variable?
For instance, the labels for CDC_class are:
. tabulate CDC_class
CDC_class | Freq. Percent Cum.
------------+-----------------------------------
A1 | 42 26.09 26.09
A2 | 38 23.60 49.69
A3 | 18 11.18 60.87
B2 | 14 8.70 69.57
B3 | 7 4.35 73.91
C1 | 1 0.62 74.53
C2 | 3 1.86 76.40
C3 | 38 23.60 100.00
------------+-----------------------------------
Total | 161 100.00
so that the output should really look like this:
CDC_class: number of categories=8; total nonmissing= 161; A1: 26.1%, A2:
23.6%, A3: 11.2%, B2: 8.7%, B3: 4.3%, C1: .6%, C2: 1.9%, C3: 23.6%
The format of the output of the tabulate command above, suggests that fancy
text manipulation (using perl, for instance) of that output would be a way to
eliminate the fancy loop above *and* to get the category labels. But is there
a more direct way?
Thank you for any pointers
Here is progress on this question, i.e., a way to get the category labels into the output. This does not work if a categorical variable has a large number of categories; apparently there is a fairly small limit on how large a local macro can be.
/*** BEGIN CODE ***/
program drop _all
program manipulateTabOutput
args THISVAR
display "`THISVAR'"
quietly: levels `THISVAR'
local UNSORTEDLEVELS=r(levels)
local mylevs : list sort UNSORTEDLEVELS
gen DUMMYjunk=0
drop DUMMY*
quietly: tabulate `THISVAR', generate (DUMMY)
scalar nCategories=r(r)
local strPctBoth=""
local index=0
while `index' < nCategories {
local index=`index' + 1
local thisCatName : word `index' of `mylevs'
quietly: summarize DUMMY`index'
scalar thispercent= round( 100* r(sum)/r(N), 0.1)
local strPctBoth= "`strPctBoth'" + "`thisCatName': " + string(thispercent, "%9.1g") + "%"
if `index' < nCategories {
local strPctBoth= "`strPctBoth'" + ", "
}
}
display "`strPctBoth'"
end
webuse choice, clear
manipulateTabOutput sex
manipulateTabOutput size
* The following fails because there are so many dealers:
manipulateTabOutput dealer
/*** END CODE ***/
Jake
Jacob A. Wegelin
[email protected] Assistant Professor
Department of Biostatistics
Virginia Commonwealth University
730 East Broad Street Room 3006
P. O. Box 980032
Richmond VA 23298-0032
U.S.A. http://www.people.vcu.edu/~jwegelin