A few extra comments:
-egen- can be used in conjunction with -by:- whenever
it makes sense to do that. (If not, StataCorp would
no doubt like to know of specific exceptions.) What's
more, although it's not now documented, the same functionality
is typically available through -by()- options, as Michael's
code exemplifies.
The flagging technique used by Michael here is
also available (although under the label "tagging")
through -egen, tag()-.
The extra command -groups- from SSC could also be
useful here.
Showing these alternatives, Michael's code
can be translated as below. (This is not better
code, just different. The logic is exactly
equivalent. This is the standard "first principles"
or "canned functions" issue.)
*flag each person just once
egen ppererson = tag(hhid persid)
* calculate number of persons per household
egen totpeople = sum(perperson), by(hhid)
* flag each household once, to avoid duplicates in list commands
egen taghh = tag(hhid)
groups totpeople hhid if taghh
Nick
[email protected]
Michael Blasnik replied to Donnell Butler:
> *flag each person just once
> bysort hhid persid: gen byte perperson=(_n==1)
> * calculate number of persons per household
> egen totpeople=sum(perperson), by(hhid)
> * flag each household once, to avoid duplicates in list commands
> bysort hhid: gen byte taghh=(_n==1)
> l hhid if taghh==1 & totpeople==1
> l hhid if taghh==1 & totpeople==2
> .. etc..
> > Here is a simplified version of my dilemma:
> >
> > I have a data set with multiple id numbers. There are is always one
> > primary id (hhid), but sometimes there are more than one
> subsidiary ids
> > (persid). The persid is simply two digits more than the
> hhid. For example
> > hhid= 12345 and persid=1234501 (or in the cases where there
> is more than
> > one, persid=1234501, 1234502, 1234503, etc. The records are
> structured
> > such that for every action on a given date, there is a record. For
> > example:
> >
> > HHID PERSID ACTION DATE
> > 12345 1234501 EAT 1/1/2003
> > 12345 1234501 DRINK 1/2/2003
> > 12345 1234501 DRINK 1/3/2003
> > 12345 1234501 BE MERRY 1/4/2003
> > 12345 1234502 DRINK 1/1/2003 <-Note new person id,
> but same hhid
> > 12345 1234502 EAT 1/3/2003
> > 12345 1234503 BE MERRY 1/2/2003 <-Note new person id, but same
> > hhid
> > 12346 1234601 BE MERRY 1/1/2003 <-Note new hhid
> >
> > ... and so on.
> >
> > So, here is my dilemma, I am trying to find a command or
> commands that
> > will do two things:
> > (1) For the entire data set, across all households, how
> many times are
> > there 1,2,3,...N numbers of unique PERSIDs within a
> household? That is,
> > how many households have 1,2,3,... N persons.
> > (2) Display the HHID for households that have X number of
> persons? That
> > is, for households with X number of unique PERSIDS within a
> household,
> > list the HHIDS.
> >
> > It seems so simple, but the count command can't count
> within variables.
> > The egen command can't work with by commands.
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/