Rodrigo Brice�o asked
>
> Following with my previous doubts: I have a hospital discharges
> database, and two of the variables from the list are:
>
> -diaest1- and -clave1-.
>
> i already processed the data to find the 10 most frequently
> diagnoses with
> the help of -egen, group()-. What do I need to do if I want the same
> thing, but this time I want to separate the variable -diaest1-.
Let's say
> that I need the first 10 diagnoses for the discharges that have a
> duration of 6 or more
> days, and the first 10 diagnoses for the discharges that have a
> duration fewer than 2 days. I already make a variable with establish
those
> durations (called -rank_estancia2-).
>
> rank_estancia2=1 (diaest <2 days)
> rank_estancia2=2 (diaest 2-5 days)
> rank_estancia2=1 (diaest 6 or more days)
>
> I tried to do something with -egen, group()- but my tries
> didn't seem to be
> useful. I already tried typing:
>
> tabsort clave1 if rank_estancia2==1 & group<11
>
> (where group being the variable calculated for the first answer
> of the day to this list and Nick Cox help me to build).
>
> Sorry for my ignorance.
and I replied
> I don't know how to do this cleanly with official
> Stata's -egen, group()- as mentioned by Rodrigo.
>
> Once more I will show a way to do something like this
> with my own -egroup()- function for -egen-, accessible
> as part of the -egenmore- package on SSC.
>
> Without access to Rodrigo's data this is easier to
> explain with an analogue for the auto data, which
> naturally anybody interested can try them themselves.
>
> Suppose we have manufacturer name and a classification
> of high or low mpg:
>
> . egen manuf = head(make)
> . gen himpg = mpg > 21
>
> Step 1. Calculate the frequencies you want displayed.
> Remember to negate them if you want them shown
> highest first.
>
> . bysort himpg manuf : gen freq = - _N
>
> Step 2. For each category of -himpg-,
> get the groups in the order defined by -freq- and -manuf-,
> and display the first 10 groups in each instance:
>
> . forval i = 0/1 {
> . qui egen group`i' = egroup(freq manuf) if himpg ==
> `i' , l(manuf)
> . tab group`i' if group`i' <= 10
> . }
>
> group(manuf |
> ) | Freq. Percent Cum.
> ------------+-----------------------------------
> Buick | 6 16.67 16.67
> Olds | 6 16.67 33.33
> Merc. | 5 13.89 47.22
> Pont. | 5 13.89 61.11
> Cad. | 3 8.33 69.44
> Dodge | 3 8.33 77.78
> Linc. | 3 8.33 86.11
> Chev. | 2 5.56 91.67
> Toyota | 2 5.56 97.22
> AMC | 1 2.78 100.00
> ------------+-----------------------------------
> Total | 36 100.00
>
> group(manuf |
> ) | Freq. Percent Cum.
> ------------+-----------------------------------
> Chev. | 4 17.39 17.39
> Plym. | 4 17.39 34.78
> VW | 4 17.39 52.17
> Datsun | 3 13.04 65.22
> AMC | 2 8.70 73.91
> Honda | 2 8.70 82.61
> Audi | 1 4.35 86.96
> BMW | 1 4.35 91.30
> Buick | 1 4.35 95.65
> Dodge | 1 4.35 100.00
> ------------+-----------------------------------
> Total | 23 100.00
>
> That could be improved a bit by putting in display
> lines.
>
> Now one question might fairly be, and this was
> what I thought of first, why not something more like
>
> . by himpg : egen group = egroup(freq manuf), l(manuf)
> . by himpg : tab group if group <= 10
>
> One answer is that -egroup()- does not support -by:-.
> An even better answer is that changing the program
> to support -by:- would run into an immediate problem
> that it can't be combined with allocation of value
> labels in the way that we want to allow output like
> that above.
>
> I'm sure that there are other ways to approach the
> problem.
Here's another, assuming
. egen manuf = head(make)
. gen himpg = mpg > 21
It uses no user-written extras. Nothing in this
assumes that the classifying variable has just
2 classes.
1. Create negated frequencies, to get proper sort
order.
. bysort himpg manuf : gen frequency = -_N
2. Calculate order explicitly:
. bysort himpg freq manuf : gen order = _n == 1
. by himpg : replace order = sum(order)
3. Flip frequencies back again:
. qui replace freq = - freq
4. Get your table:
. by himpg : tabdisp order if order <= 10, cell(manuf freq)
______________________________________________________________________
_________
-> himpg = 0
----------------------------------
order | manuf frequency
----------+-----------------------
1 | Buick 6
2 | Olds 6
3 | Merc. 5
4 | Pont. 5
5 | Cad. 3
6 | Dodge 3
7 | Linc. 3
8 | Chev. 2
9 | Toyota 2
10 | AMC 1
----------------------------------
______________________________________________________________________
_________
-> himpg = 1
----------------------------------
order | manuf frequency
----------+-----------------------
1 | Chev. 4
2 | Plym. 4
3 | VW 4
4 | Datsun 3
5 | AMC 2
6 | Honda 2
7 | Audi 1
8 | BMW 1
9 | Buick 1
10 | Dodge 1
----------------------------------
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/