Honorati Masanja--
I prefer Scott Merryman's (and Michael Blasnik's second) solution to
the others for one simple reason: it accounts for the *possibility*
that Infected is missing. The others fail to bring potential problems
with missing values to light, by coding cases where households have
missing values as one or zero. You may have a good reason for coding
such households as "infected" or "not" even in the presence of missing
values at the individual level (such as a skip pattern of questions
that ensures that anyone for whom Infected is missing is sure to be
uninfected), but that extra step of what to do in the case of missing
values at the individual level should be coded explicitly, or at least
should be checked to see if it matters, IMHO:
egen anyone_coded_infected=max(Infected), by(HouseholdID)
bys HouseholdID (Infected): g HHinfected = Infected[_N]
tab HHinfected anyone_coded_infected, mi
On 11/30/06, Scott Merryman <[email protected]> wrote:
In addition to the suggestions by Maarten and Philipp, another way would be:
clear
input str6 householdid str8 personid infected
010101 01010101 1
010101 01010102 1
010102 01010201 0
010102 01010202 1
010102 01010203 1
010103 01010301 0
010103 01010302 0
010103 01010303 0
010104 01010401 0
010104 01010402 1
end
gen hinfect = infect == 1
bys hous (hinfect): replace hinfect = hinfect[_N]
l, sepby(householdid)
Scott
> -----Original Message-----
> From:Honorati Masanja
> I have a dataset with individuals in households. Each individual has a
> unique identifier. Some individuals in the households are infected and
> some are not. My problem is how do I tell Stata to create a new variable
> which will have 1 for households with at least one infected person and
> 0 for households without infected persons.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/