Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Paul Novosad <novosad@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: AW: Using egen and by efficiently when some observations are missing |
Date | Thu, 22 Apr 2010 15:21:16 -0400 |
Thanks to Martin and Austin for their excellent advice, which is exactly the kind of solution I was looking for. Regards, Paul On Thu, Apr 22, 2010 at 12:00 PM, Martin Weiss <martin.weiss1@gmx.de> wrote: > > <> > > To solve these things, you have to get creative and note that -egen- > functions often take expressions as arguments, as duly shown in the relevant > sections of -help egen-: > > > ************* > bys country: egen count_i_alt = total((condition==1)*(!mi(i))) > ************* > > > BTW, the max operator projects the "tmp" value into the rows that end up > with missings after the first -egen- call, right? Is this really all you > want out of it, or am I missing anything here? > > ******** > clear* > > inp byte(country:mylabel i condition), automatic > A 2 1 > A 3 . > A 4 0 > A . 1 > A . . > A . 1 > B 2 1 > B 3 . > B 4 0 > B 1 . > B . 1 > B 3 1 > end > > bys country: egen tmp = count(i) if condition == 1 > bys country: egen count_i = max(tmp) > > bys country: egen count_i_alt = total((condition==1)*(!mi(i))) > > li, sepby(country) noo > ******** > > > > HTH > Martin > > > -----Ursprüngliche Nachricht----- > Von: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Paul Novosad > Gesendet: Donnerstag, 22. April 2010 17:25 > An: statalist@hsphsun2.harvard.edu > Betreff: st: Using egen and by efficiently when some observations are > missing > > Dear list, > > It often takes me three lines to generate variables based on > conditional group operations using egen. For example, I want to run > some egen operation on a subset of the data, such as a count. But I > want the count to exist even when the condition does not hold. I use > the following: > > by country: egen tmp = count(i) if condition == 1 > by country: egen count_i = max(tmp) > drop tmp > > I write code like this all over the place, and each time it makes my > heart sink. It feels inefficient but I do not have another solution. > Can someone recommend a more efficient practice? > > Thanks, > > Paul > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/