From | "Michael Blasnik" <[email protected]> |
To | <[email protected]> |
Subject | st: Re: RE: why the different? |
Date | Thu, 27 Oct 2005 08:13:04 -0400 |
"Wanli Zhao" <[email protected]> wrote:
I found something interesting & puzzling. Maybe I just miss something. I have a dataset like this:
<snip>
Now, I want to have the number of people older than 68 by each gvkey. So II think this behavior can be frustrating at times, but it certainly isn't puzzling and I'd like to know what your examples are of other Stata commands that don't follow this convention. Commands that use -if- clauses usually only operate on observations meeting the qualifier: gen x2=x^2 if x>5 will create missing values in x2 for any cases where x is not greater than . -egen- follows this same behavior and your example with the egen sum doesn't have an -if- clause. I have long thought that there ought to be an egen option for filling in these missing values when a function yields a constant for each by group. Sometimes you can use logical conditions within the function to accomplish this, as in egen sumgt68=sum(x*(age>68)), by(gvkey).
do {egen old=count(age) if age>=69,by(gvkey)}. Then I found that the number
is correct but it only shows when the age variable is 69 or bigger. I
thought it would put the same number within gvkey for each age, just as I
experienced a lot of such functions do. Certainly, I did the following:
gsort gvkey -old
by gvkey: replace old=old=[_n-1] if old==.
That's OK. But for the outsider, I want the number of 1's within each gvkey
so I did {egen outside=sum(outsider), by(gvkey)}. This time, there is no
missing value. Why the "count" behaves differently? Certainly, I can
generate another dummy for age bigger than 68 and then sum that up. Same
result. But I just wonder why "count" did not fill in all the values?
Cheers,
Wanli Zhao
© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |