Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Daniel Escher <descher@nd.edu> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Counting observations within groups |
Date | Fri, 30 Nov 2012 08:53:10 -0500 |
Also, fwiw, this exercise has taught me the importance of parentheses in "&" and "|" expressions. Using Austin's code with them gives wildly different answers. A good lesson for a Stata novice like myself. On Fri, Nov 30, 2012 at 8:12 AM, Daniel Escher <descher@nd.edu> wrote: > Austin, > > Thank you so much! I had forgotten about using levelsof to create a > local of all values in a variable. In this case, your third option was > computationally quickest, but I'll keep the first two options in my > head for later situations. For some reason, totprod>`m' needed to be > changed to totprod>r(mean). Thus, > > su totprod, mean > g big=(totprod>r(mean)&totprod<.)&(sic==12110|sic==11110) > by fips: g sbig=sum(big) > by fips: replace sbig=sbig[_N] > > > On Thu, Nov 29, 2012 at 6:03 PM, Austin Nichols <austinnichols@gmail.com> wrote: >> Daniel Escher <descher@nd.edu>: >> >> I sent my prior post a bit prematurely... I meant to go on to say-- >> but one does not need a loop for this particular problem. >> >> Make a dummy, sum within county: >> >> su totprod, mean >> g big=(totprod>`m'&totprod<.)&(sic==12110|sic==11110) >> bys fips: g sbig=sum(big) >> by fips: replace sbig=sbig[_N] >> >> On Thu, Nov 29, 2012 at 5:48 PM, Daniel Escher <descher@nd.edu> wrote: >>> Hello, >>> >>> I am trying to count the number of mines in a county by production. >>> I.e., I'd like the number of mines in each county that are above the >>> overall mean of production, and the number that are below. There are >>> multiple mines per county, which is identified by its FIPS code. >>> Missing data are marked by . The data are in long format. >>> >>> Here's what I have so far: >>> . *bigmines = # of mines in a county above the overall mean >>> . *totprod = total production per mine >>> . *sic = type of mine >>> >>> . *ATTEMPT ONE >>> . sort fips >>> . su totprod // to get mean >>> . by fips: egen bigmines = count(inrange(totprod, r(mean), .) & >>> sic==12110 | sic==11110) // This gives me total number of mines per >>> FIPS code - not those that meet the criteria >>> . drop bigmines >>> >>> . *ATTEMPT TWO >>> . su totprod // to get mean >>> . by fips: egen bigmines = total(mshahrs > r(mean) & sic==12110 | >>> sic==11110) // This gives me the total number of mines per FIPS code >>> if any mine exceeds the mean >>> . drop bigmines >>> >>> . *ATTEMPT THREE >>> . *Then I read Nick Cox's helpful article >>> (http://www.stata-journal.com/sjpdf.html?articlenum=pr0029) which >>> clued me in to -count-: >>> . gen bigmines = 0 >>> . su totprod >>> . count if inrange(totprod, r(mean), .) & sic==12110 | sic==11110 >>> . replace bigmines = r(N) >>> >>> The last attempt is what I want, and it "works." However, I don't know >>> how to -count- and then store r(N) for each FIPS code. Using -by- does >>> not seem to work. This probably requires a loop like... >>> >>> forvalues j = all values of fips { >>> count if inrange(mshahrs, r(mean), .) & sic==12110 | sic==11110 >>> replace bigmines_hrs = r(N) >>> } >>> >>> Is this close? Thank you so much for your help and time. >>> >>> Gratefully, >>> Daniel >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/