Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Counting observations within groups
From
Daniel Escher <[email protected]>
To
[email protected]
Subject
Re: st: Counting observations within groups
Date
Fri, 30 Nov 2012 08:53:10 -0500
Also, fwiw, this exercise has taught me the importance of parentheses
in "&" and "|" expressions. Using Austin's code with them gives wildly
different answers.
A good lesson for a Stata novice like myself.
On Fri, Nov 30, 2012 at 8:12 AM, Daniel Escher <[email protected]> wrote:
> Austin,
>
> Thank you so much! I had forgotten about using levelsof to create a
> local of all values in a variable. In this case, your third option was
> computationally quickest, but I'll keep the first two options in my
> head for later situations. For some reason, totprod>`m' needed to be
> changed to totprod>r(mean). Thus,
>
> su totprod, mean
> g big=(totprod>r(mean)&totprod<.)&(sic==12110|sic==11110)
> by fips: g sbig=sum(big)
> by fips: replace sbig=sbig[_N]
>
>
> On Thu, Nov 29, 2012 at 6:03 PM, Austin Nichols <[email protected]> wrote:
>> Daniel Escher <[email protected]>:
>>
>> I sent my prior post a bit prematurely... I meant to go on to say--
>> but one does not need a loop for this particular problem.
>>
>> Make a dummy, sum within county:
>>
>> su totprod, mean
>> g big=(totprod>`m'&totprod<.)&(sic==12110|sic==11110)
>> bys fips: g sbig=sum(big)
>> by fips: replace sbig=sbig[_N]
>>
>> On Thu, Nov 29, 2012 at 5:48 PM, Daniel Escher <[email protected]> wrote:
>>> Hello,
>>>
>>> I am trying to count the number of mines in a county by production.
>>> I.e., I'd like the number of mines in each county that are above the
>>> overall mean of production, and the number that are below. There are
>>> multiple mines per county, which is identified by its FIPS code.
>>> Missing data are marked by . The data are in long format.
>>>
>>> Here's what I have so far:
>>> . *bigmines = # of mines in a county above the overall mean
>>> . *totprod = total production per mine
>>> . *sic = type of mine
>>>
>>> . *ATTEMPT ONE
>>> . sort fips
>>> . su totprod // to get mean
>>> . by fips: egen bigmines = count(inrange(totprod, r(mean), .) &
>>> sic==12110 | sic==11110) // This gives me total number of mines per
>>> FIPS code - not those that meet the criteria
>>> . drop bigmines
>>>
>>> . *ATTEMPT TWO
>>> . su totprod // to get mean
>>> . by fips: egen bigmines = total(mshahrs > r(mean) & sic==12110 |
>>> sic==11110) // This gives me the total number of mines per FIPS code
>>> if any mine exceeds the mean
>>> . drop bigmines
>>>
>>> . *ATTEMPT THREE
>>> . *Then I read Nick Cox's helpful article
>>> (http://www.stata-journal.com/sjpdf.html?articlenum=pr0029) which
>>> clued me in to -count-:
>>> . gen bigmines = 0
>>> . su totprod
>>> . count if inrange(totprod, r(mean), .) & sic==12110 | sic==11110
>>> . replace bigmines = r(N)
>>>
>>> The last attempt is what I want, and it "works." However, I don't know
>>> how to -count- and then store r(N) for each FIPS code. Using -by- does
>>> not seem to work. This probably requires a loop like...
>>>
>>> forvalues j = all values of fips {
>>> count if inrange(mshahrs, r(mean), .) & sic==12110 | sic==11110
>>> replace bigmines_hrs = r(N)
>>> }
>>>
>>> Is this close? Thank you so much for your help and time.
>>>
>>> Gratefully,
>>> Daniel
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/