[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: text manipulation of tabulate output

From	[email protected]
To	[email protected]
Subject	Re: st: text manipulation of tabulate output
Date	Sat, 17 May 2008 16:56:12 +0000

Sent from my BlackBerry® wireless device

-----Original Message-----
From: "Gabi Huiber" <[email protected]>

Date: Sat, 19 Apr 2008 13:53:10 
To:[email protected]
Subject: Re: st: text manipulation of tabulate output


file and writing another, or some regular expression matching -- I
used to do in PHP. But Stata can do all that. It has fopen read and
write, and it has quite extensive regex capabilities.

But the most direct way, I think, would be to suitably (and perhaps
temporarily, using "preserve/restore") "collapse (sum)" or "collapse
(mean)" your data. This would produce a smaller dataset with only the
summary of interest, which you can "outsheet" to whatever format your
Stata-deprived audience finds acceptable.

Gabi

On 4/18/08, Jacob Wegelin <[email protected]> wrote:
>
>  Suppose you have a list of categorical (qualitative) variables that are in
> your data; each variable has some arbitrary number of categories; and you
> want to produce a report, in text,
>
>  - with one row for each variable and
>
>  - a list of the percents in each category for each variable.
>
>  The code below produces the following display for a set of variables:
>
>  Paired_Biopsy_: number of categories=2; total nonmissing= 250; 81.2%, 18.8%
> ALTcode: number of categories=3; total nonmissing= 250; 41.2%, 56%, 2.8%
> Alcohol: number of categories=2; total nonmissing= 250; 75.2%, 24.8%
> CDC_class: number of categories=8; total nonmissing= 161; 26.1%, 23.6%,
> 11.2%, 8.7%, 4.3%, .6%, 1.9%, 23.6%
>
>  BEGIN CODE
>
>  local QualitVars ///
>          Paired_Biopsy_ ///
>          ALTcode ///
>          Alcohol ///
>          CDC_class ///
>
>  display "`QualitVars'"
>
>  tabulate CDC_class
>
>  generate DUMMYjunk=0
>
>  foreach THISVAR of varlist `QualitVars' ///
>  {
>          display " "
>          display "`THISVAR'" ": " _continue
>          drop DUMMY*
>          quietly: tabulate `THISVAR', generate (DUMMY)
>          scalar nCategories=r(r)
>          scalar denominator=r(N)
>          display "number of categories=" nCategories "; total nonmissing= "
> denominator "; " _continue
>
>          local index=0
>          while `index' < nCategories {
>                  local index=`index' + 1
>                  quietly: summarize DUMMY`index'
>                  scalar thispercent= round( 100* r(sum)/denominator, 0.1)
>                  display thispercent "%" _continue
>                  if `index' < nCategories {
>                          display ", " _continue
>                  }
>          }
>
>  }
>
>  END CODE
>
>  Question Number One: Am I reinventing the wheel? Is there an easier way to
> do this?
>
>  Question Number Two: Is there a way to get the labels for the categories
> for each variable?
>
>  For instance, the labels for CDC_class are:
>
>  . tabulate CDC_class
>
>    CDC_class |      Freq.     Percent        Cum.
>  ------------+-----------------------------------
>           A1 |         42       26.09       26.09
>           A2 |         38       23.60       49.69
>           A3 |         18       11.18       60.87
>           B2 |         14        8.70       69.57
>           B3 |          7        4.35       73.91
>           C1 |          1        0.62       74.53
>           C2 |          3        1.86       76.40
>           C3 |         38       23.60      100.00
>  ------------+-----------------------------------
>        Total |        161      100.00
>
>  so that the output should really look like this:
>
>  CDC_class:  number of categories=8; total nonmissing= 161; A1: 26.1%, A2:
> 23.6%, A3: 11.2%, B2: 8.7%, B3: 4.3%, C1: .6%, C2: 1.9%, C3: 23.6%
>
>  The format of the output of the tabulate command above, suggests that fancy
> text manipulation (using perl, for instance) of that output would be a way
> to eliminate the fancy loop above *and* to get the category labels. But is
> there a more direct way?
>
>  Thank you for any pointers
>
>  Jake
>
>  Jacob A. Wegelin
>  [email protected] Assistant Professor
>  Department of Biostatistics
>  Virginia Commonwealth University
>  730 East Broad Street Room 3006
>  P. O. Box 980032
>  Richmond VA 23298-0032
>  U.S.A. http://www.people.vcu.edu/~jwegelin
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: McDonald and Moffitt’s (1980) decompo sition method for Tobit coefficients.
Next by Date: st: showing the mean in histogram
Previous by thread: st: McDonald and Moffitt’s (1980) decompo sition method for Tobit coefficients.
Next by thread: st: showing the mean in histogram
Index(es):
- Date
- Thread