Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: summarize by different levels/groups with -egen- ?


From   Joerg Luedicke <[email protected]>
To   [email protected]
Subject   Re: st: RE: summarize by different levels/groups with -egen- ?
Date   Fri, 11 Jan 2013 12:25:00 -0500

Consider the following:

// Data
clear

input str2 Class str1 Pathogen
A1 H
A1 S
A1 T
A2 S
A2 K
A3 H
A3 D
B1 H
B1 S
end

// Flagging classes with at least one H
bys Class: egen pat2=max(Pathogen=="H")

// To analyze that at class level
bys Class: gen tag=_n==1
keep if tag

Joerg

On Fri, Jan 11, 2013 at 11:39 AM, Patricia Biedermann
<[email protected]> wrote:
> Hello,
> Thank you Lovisa & Nick.
> I've tried your commands, but it seems not to work out the way I want
> to have it. (pathogen is a string variable).
>
> The issue is that, when I creat the dummy variable in the end (as
> described by Lovisa) I will get for each H in one class a "1". When I
> further summarize it, I have the total amount of H. But I want to have
> a total amount of classes, who are affected with H (regardless how
> many children itself were affected by the pathogen).
>
> e.g.
> Class         Pathogen
> A1                H
> A1                S
> A1                T
> A2                S
> A2                K
> A3                H
> A3                D
> B1                H
> B1                S                  0
>
> Finally --> 3 (out of 4) classes are affected by "H". (I don't care
> about how many individuals in one class!).
>
> Maybe I've to think about it and approach it differently.
> Cheers.
>
> On Fri, Jan 11, 2013 at 1:46 PM, Nick Cox <[email protected]> wrote:
>> You don't need a dummy or indicator variable. Assuming that -pathogen-
>> is a string variable,
>>
>> ... mean(pathogen == "H")
>>
>> will work fine as the -mean()- function of -egen- takes expressions.
>> If it's a numeric variable, the same principle applies, but you need a
>> different expression.
>>
>> Nick
>>
>> On Fri, Jan 11, 2013 at 12:01 PM, Lovisa Persson
>> <[email protected]> wrote:
>>
>>> First create a dummy variable for each pathogen, pathogeni.
>>> Then generate the mean for each class and each pathogen(i) by writing:
>>>
>>> egen meanpathogeni=mean(pathogeni), by(class)
>>>
>>> every class that now has a certain pathogen in it will have a value of
>>> meanpathogeni higher than zero, and every class that do not have a certain
>>> pathogen in it will have a value of zero.
>>> The observation value will be the same within classes, which is the mean
>>> number of the pathogen in this class.
>>>
>>> So now you generate a new dummy variable that equals 1 if the value of
>>> meanpathogeni is higher than one.
>>> Now each class will have the same observation value which will be 1 or 0
>>> depending on whether this class had at least one observation of this
>>> particular pathogen in it.
>>
>> Patricia Biedermann
>>
>>> I want to summarize following:
>>>
>>> School          Class           Pathogen
>>> A                       A1                      H
>>> A                       A1                      T
>>> A                       A1                      H
>>> A                       A2                      S
>>> A                       A2                      H
>>> A                       A3                      K
>>> A                       A3                      I
>>> B                       B1                      S
>>> B                       B1                      T
>>> B                       B2                      H
>>>
>>> I've visited different classes in different schools. In each class I checked
>>> if the children were infected with some kind of pathogen.
>>> -       I found e.g that in class A1 two children were infected with
>>> pathogen H.
>>> -       Now, I want to summarize that I just found pathogen H in class A1
>>> WITHOUT the actual amount of pathogen itself (2 times in this case);
>>> Basically "Was pathogen H found in class A1" = yes or no; Finally, the
>>> information should be presented at school level. ("How many classes in
>>> school A pathogen H was found?)
>>>
>>> So far I tried egen, bysort / =_n==N and commands. I also created dummy
>>> variables for each pathogen.  It never worked out the right way.
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index