In addition to other suggestions:
-contract- lets you specify percents.
After -summarize- r(mean) is a more direct way to get the mean than r(sum)/r(N). If you specify -summarize, meanonly- that is a good habit and the -quietly- is then unnecessary.
There is no -version- statement in your program. -levels- is not a current command. At some point it may become available only under version control and then your program will be broken.
-forval- makes this easier than -while-.
Nick
[email protected]
Jacob Wegelin
On Fri, 18 Apr 2008, Jacob Wegelin wrote:
>
> Suppose you have a list of categorical (qualitative) variables that are in
> your data; each variable has some arbitrary�number of categories; and you
> want to produce a report, in text,
>
> - with one row for each variable and
>
> - a list of the percents in each category for each variable.
>
> The code below produces the following display for a set of variables:
>
> Paired_Biopsy_: number of categories=2; total nonmissing= 250; 81.2%, 18.8%
> ALTcode: number of categories=3; total nonmissing= 250; 41.2%, 56%, 2.8%
> Alcohol: number of categories=2; total nonmissing= 250; 75.2%, 24.8%
> CDC_class: number of categories=8; total nonmissing= 161; 26.1%, 23.6%,
> 11.2%, 8.7%, 4.3%, .6%, 1.9%, 23.6%
>
> BEGIN CODE
>
> local QualitVars ///
> �� � � �Paired_Biopsy_ ///
> �� � � �ALTcode ///
> �� � � �Alcohol ///
> �� � � �CDC_class ///
>
> display "`QualitVars'"
>
> tabulate CDC_class
>
> generate DUMMYjunk=0
>
> foreach THISVAR of varlist `QualitVars' ///
> {
> �� � � �display " "
> �� � � �display "`THISVAR'" ": " _continue
> �� � � �drop DUMMY*
> �� � � �quietly: tabulate `THISVAR', generate (DUMMY)
> �� � � �scalar nCategories=r(r)
> �� � � �scalar denominator=r(N)
> �� � � �display "number of categories=" nCategories "; total nonmissing= "
> denominator "; " _continue
>
> �� � � �local index=0
> �� � � �while `index' < nCategories {
> �� � � � � � � �local index=`index' + 1
> �� � � � � � � �quietly: summarize DUMMY`index'
> �� � � � � � � �scalar thispercent= round( 100* r(sum)/denominator, 0.1)
> �� � � � � � � �display thispercent "%" _continue
> �� � � � � � � �if `index' < nCategories {
> �� � � � � � � � � � � �display ", " _continue
> �� � � � � � � �}
> �� � � �}
>
> }
>
> END CODE
>
> Question Number One: Am I reinventing the wheel? Is there an easier way to do
> this?
>
> Question Number Two: Is there a way to get the labels for the categories for
> each variable?
>
> For instance, the labels for CDC_class are:
>
> . tabulate CDC_class
>
> ��CDC_class | � � �Freq. � � Percent � � � �Cum.
> ------------+-----------------------------------
> �� � � � A1 | � � � � 42 � � � 26.09 � � � 26.09
> �� � � � A2 | � � � � 38 � � � 23.60 � � � 49.69
> �� � � � A3 | � � � � 18 � � � 11.18 � � � 60.87
> �� � � � B2 | � � � � 14 � � � �8.70 � � � 69.57
> �� � � � B3 | � � � � �7 � � � �4.35 � � � 73.91
> �� � � � C1 | � � � � �1 � � � �0.62 � � � 74.53
> �� � � � C2 | � � � � �3 � � � �1.86 � � � 76.40
> �� � � � C3 | � � � � 38 � � � 23.60 � � �100.00
> ------------+-----------------------------------
> �� � �Total | � � � �161 � � �100.00
>
> so that the output should really look like this:
>
> CDC_class: number of categories=8; total nonmissing= 161; A1: 26.1%, A2:
> 23.6%, A3: 11.2%, B2: 8.7%, B3: 4.3%, C1: .6%, C2: 1.9%, C3: 23.6%
>
> The format of the output of the tabulate command above, suggests that fancy
> text manipulation (using perl, for instance) of that output would be a way to
> eliminate the fancy loop above *and* to get the category labels. But is there
> a more direct way?
>
> Thank you for any pointers
Here is progress on this question, i.e., a way to get the category labels into the output. This does not work if a categorical variable has a large number of categories; apparently there is a fairly small limit on how large a local macro can be.
/*** BEGIN CODE ***/
program drop _all
program manipulateTabOutput
args THISVAR
display "`THISVAR'"
quietly: levels `THISVAR'
local UNSORTEDLEVELS=r(levels)
local mylevs : list sort UNSORTEDLEVELS
gen DUMMYjunk=0
drop DUMMY*
quietly: tabulate `THISVAR', generate (DUMMY)
scalar nCategories=r(r)
local strPctBoth=""
local index=0
while `index' < nCategories {
local index=`index' + 1
local thisCatName : word `index' of `mylevs'
quietly: summarize DUMMY`index'
scalar thispercent= round( 100* r(sum)/r(N), 0.1)
local strPctBoth= "`strPctBoth'" + "`thisCatName': " + string(thispercent, "%9.1g") + "%"
if `index' < nCategories {
local strPctBoth= "`strPctBoth'" + ", "
}
}
display "`strPctBoth'"
end
webuse choice, clear
manipulateTabOutput sex
manipulateTabOutput size
* The following fails because there are so many dealers:
manipulateTabOutput dealer
/*** END CODE ***/
>
> Jake
>
> Jacob A. Wegelin
> [email protected] Assistant Professor
> Department of Biostatistics
> Virginia Commonwealth University
> 730 East Broad Street Room 3006
> P. O. Box 980032
> Richmond VA 23298-0032
> U.S.A. http://www.people.vcu.edu/~jwegelin
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/