file and writing another, or some regular expression matching -- I
used to do in PHP. But Stata can do all that. It has fopen read and
write, and it has quite extensive regex capabilities.
But the most direct way, I think, would be to suitably (and perhaps
temporarily, using "preserve/restore") "collapse (sum)" or "collapse
(mean)" your data. This would produce a smaller dataset with only the
summary of interest, which you can "outsheet" to whatever format your
Stata-deprived audience finds acceptable.
Gabi
On 4/18/08, Jacob Wegelin <[email protected]> wrote:
>
> Suppose you have a list of categorical (qualitative) variables that are in
> your data; each variable has some arbitrary number of categories; and you
> want to produce a report, in text,
>
> - with one row for each variable and
>
> - a list of the percents in each category for each variable.
>
> The code below produces the following display for a set of variables:
>
> Paired_Biopsy_: number of categories=2; total nonmissing= 250; 81.2%, 18.8%
> ALTcode: number of categories=3; total nonmissing= 250; 41.2%, 56%, 2.8%
> Alcohol: number of categories=2; total nonmissing= 250; 75.2%, 24.8%
> CDC_class: number of categories=8; total nonmissing= 161; 26.1%, 23.6%,
> 11.2%, 8.7%, 4.3%, .6%, 1.9%, 23.6%
>
> BEGIN CODE
>
> local QualitVars ///
> Paired_Biopsy_ ///
> ALTcode ///
> Alcohol ///
> CDC_class ///
>
> display "`QualitVars'"
>
> tabulate CDC_class
>
> generate DUMMYjunk=0
>
> foreach THISVAR of varlist `QualitVars' ///
> {
> display " "
> display "`THISVAR'" ": " _continue
> drop DUMMY*
> quietly: tabulate `THISVAR', generate (DUMMY)
> scalar nCategories=r(r)
> scalar denominator=r(N)
> display "number of categories=" nCategories "; total nonmissing= "
> denominator "; " _continue
>
> local index=0
> while `index' < nCategories {
> local index=`index' + 1
> quietly: summarize DUMMY`index'
> scalar thispercent= round( 100* r(sum)/denominator, 0.1)
> display thispercent "%" _continue
> if `index' < nCategories {
> display ", " _continue
> }
> }
>
> }
>
> END CODE
>
> Question Number One: Am I reinventing the wheel? Is there an easier way to
> do this?
>
> Question Number Two: Is there a way to get the labels for the categories
> for each variable?
>
> For instance, the labels for CDC_class are:
>
> . tabulate CDC_class
>
> CDC_class | Freq. Percent Cum.
> ------------+-----------------------------------
> A1 | 42 26.09 26.09
> A2 | 38 23.60 49.69
> A3 | 18 11.18 60.87
> B2 | 14 8.70 69.57
> B3 | 7 4.35 73.91
> C1 | 1 0.62 74.53
> C2 | 3 1.86 76.40
> C3 | 38 23.60 100.00
> ------------+-----------------------------------
> Total | 161 100.00
>
> so that the output should really look like this:
>
> CDC_class: number of categories=8; total nonmissing= 161; A1: 26.1%, A2:
> 23.6%, A3: 11.2%, B2: 8.7%, B3: 4.3%, C1: .6%, C2: 1.9%, C3: 23.6%
>
> The format of the output of the tabulate command above, suggests that fancy
> text manipulation (using perl, for instance) of that output would be a way
> to eliminate the fancy loop above *and* to get the category labels. But is
> there a more direct way?
>
> Thank you for any pointers
>
> Jake
>
> Jacob A. Wegelin
> [email protected] Assistant Professor
> Department of Biostatistics
> Virginia Commonwealth University
> 730 East Broad Street Room 3006
> P. O. Box 980032
> Richmond VA 23298-0032
> U.S.A. http://www.people.vcu.edu/~jwegelin
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/