[Apologies for previous premature reply.]
> Rodrigo Brice�o" <[email protected]>
>
> 2. Months ago I asked to the list how can I generate some age groups.
> I use that help in order to have a frequency of the hospital discharges
> by age groups. The problem here is that I found that my variable age
> was a string8 variable. Then the options to generate a new variable
> was restricted: I tried typing:
>
> gen str8 rank_edad=1 if inrange(edad,0,0)
>
> to construct my age groups (less than 1 year, between 1 and 4,
> between 5 and 9, 10-19, 20-29, 30-39, 40-49, 50-59 and the last
> one 60 or more)
That's not going to work as you typed it if only because
strings need to be in " " and you can't apply -inrange()-
to a string variable.
> This not work for me so I try:
>
> encode edad, gen (edad2)
>
> like I learned from my net course on Stata.
> The things appear to be ok, because when I type
>
> tab edad2
>
> I see all the ages correctly. The problem here is that
> I'm building three groups (the first three) that don�t have data
> on it. But Stata is calculating something on it, I don't know why.
>
> gen byte rank_edad=1 if range(edad2,0,0)
> replace rank_edad=2 if inrange (edad2,1,4)
> replace rank_edad=3 if inrange (edad2,5,9)
> replace rank_edad=4 if inrange (edad2,10,19)
> and so on.....
>
> tab rank_edad
>
> rank_edad | Freq. Percent Cum.
> ------------+-----------------------------------
> 1-4_anos | 90 1.73 1.73
> 5-9_anos | 1190 22.88 24.61
> 10-19_anos | 2114 40.64 65.24
> 20-29_anos | 892 17.15 82.39
> 30-39_anos | 304 5.84 88.24
> 40-49_anos | 202 3.88 92.12
> 50-59_anos | 162 3.11 95.23
> Mas_60_anos | 248 4.77 100.00
> ------------+-----------------------------------
> Total | 5202 100.00
>
> Do you know or guess why Stata is putting data on the first three
> groups that is supposed to be empty (I build those groups because
> I'm making a do file, that I can apply to other databases).
I am far from clear about everything you have done,
for example, on how -rank_edad- got its value labels.
One puzzle is that you are generating -rank_edad-
from the _encoded_ variable -edad2- which is _not_
age but age categories.
tab edad2, nola
will show you, I think, that you just have age categories 1, 2, 3,
etc.
The other detail which may help is to note that Stata's
default encoding is on alphanumeric order. Amazingly,
something that I am writing at the moment discusses
sorting of age intervals and, possibly, one of your
problems.
If you give Stata these string values to -sort-
"1-4_anos"
"5-9_anos"
"10-19_anos"
"20-29_anos"
"30-39_anos"
"40-49_anos"
"50-59_anos"
"Mas_60_anos"
you will get
"1-4_anos"
"10-19_anos"
"20-29_anos"
"30-39_anos"
"40-49_anos"
"5-9_anos"
"50-59_anos"
"Mas_60_anos"
because -sort- of strings is on dictionary principles
and characters are put in ASCII order with no
reference to their meaning. By default -encode-
will use this order to assign categories.
That is, "1-4_anos" is -encode-d as 1, etc.
Something like this may help to explain some
of your results. The remedy is to define your
own value labels and to insist that -encode- use them.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/