I found something interesting & puzzling. Maybe I just miss something. I
have a dataset like this:
Gvkey age outsider
3311 70 1
3311 69 0
3311 65 1
3311 68 1
5455 71 0
5455 60 1
5455 65 1
5455 80 0
...
Now, I want to have the number of people older than 68 by each gvkey. So I
do {egen old=count(age) if age>=69,by(gvkey)}. Then I found that the number
is correct but it only shows when the age variable is 69 or bigger. I
thought it would put the same number within gvkey for each age, just as I
experienced a lot of such functions do. Certainly, I did the following:
gsort gvkey -old
by gvkey: replace old=old=[_n-1] if old==.
That's OK. But for the outsider, I want the number of 1's within each gvkey
so I did {egen outside=sum(outsider), by(gvkey)}. This time, there is no
missing value. Why the "count" behaves differently? Certainly, I can
generate another dummy for age bigger than 68 and then sum that up. Same
result. But I just wonder why "count" did not fill in all the values?
Cheers,
Wanli Zhao
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/