Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: egen cut - how to force a category even if zero observations
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: egen cut - how to force a category even if zero observations
Date
Thu, 27 Feb 2014 01:10:21 +0000
If allowed to express an opinion -egen, cut()- would express total
willingness to assign the value 0 should any values satisfy your rule,
but in your dataset they don't. There is no sense in which -egen,
cut()- can create a category that persists in any sense after the
calculation is done.
In particular, there is no sense in which -tabulate- remembers or
knows that a particular rule was used to create the variable; it just
shows the values as they exist when invoked.
You could say much the same about many other categorisations. So, in
your dataset, the rule
10 * floor(age/10)
would create the same numeric values 40(10)90; in principle it _could_
have created
..., 10, 20, 30 or 100, 110, 120, ... but no such values were created
because no suitable data values were found.
I suspect that you mean this:
1. I am thinking of my variable as categorised into a fixed, finite
set of categories.
2. So I wish to see zero occurrences tabulated if any of those
categories do not exist in the data.
The crux here is not how variables are created; it is what tabulation
commands will or will not do.
So your complaint is really about -tabulate- used for one-way tables,
which just declines to show non-existent categories. Stata has to be
fought all the way to satisfy this desire: -tabcount- (SSC) is one
approach and there was some discussion in
SJ-3-4 pr0011 . . . . . . . . Speaking Stata: Problems with tables, Part II
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q4/03 SJ 3(4):420--439 (no commands)
reviews three user-written commands (tabcount, makematrix,
and groups) as different approaches to tabulation problems
http://www.stata-journal.com/sjpdf.html?articlenum=pr0011
-fre- (SSC) is another such approach.
Nick
[email protected]
On 27 February 2014 00:38, Anthony Khawaja <[email protected]>
:
> Is it possible to force the "egen cut" command to keep a specified category
> even though there are zero observations within that category. For example,
> I want to write a script that will work on multiple different datasets, and
> I want to categorise age into <40, >=40 <50, >=50 <60, >=60 <70, >=70 <80,
>>=80 <90, >=90. The egen cut command works well unless I have zero
> observation in a category - rather than still creating that as a level of
> the new categorical variable, Stata just doesn't form the category. This
> would usually be fine, but I am using "file write" commands from which I
> want to produce identically shaped tab delimited files to easily overlay
> numbers from multiple studies.
>
> I have searched extensively but not found a simple solution. Of course I
> could manually create the new variable, level by level. But I have many
> such variables, and this would be time consuming (and lack elegance!). Does
> anyone know of an elegant solution?
>
> For example, in one dataset, there are no participants <40 years. So the
> following command yields one less level in the categorical variable produced
> than I wanted:
>
> . egen agecut = cut(age), at (0 40 50 60 70 80 90 130) label
>
>
> . tab agecut
>
> agecut | Freq. Percent Cum.
> ------------+-----------------------------------
> 40- | 29 0.39 0.39
> 50- | 873 11.73 12.12
> 60- | 3,540 47.56 59.67
> 70- | 2,364 31.76 91.43
> 80- | 629 8.45 99.88
> 90- | 9 0.12 100.00
> ------------+-----------------------------------
> Total | 7,444 100.00
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/