Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Robert Picard <picard@netbox.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Generating dummy variable with information of household survey from different observations |
Date | Mon, 7 May 2012 10:59:17 -0400 |
Here's another way to do this using -merge- *----------- begin example ------------- clear inp household_id individual_id father_row 1011 1 . 1011 2 . 1011 3 1 1011 4 1 1012 1 2 1012 2 . 1013 1 . 1013 2 . 1013 3 2 1013 4 1 1013 5 1 end tempfile f save "`f'" drop if mi(father_row) keep household_id father_row rename father_row individual_id sort household_id individual_id by household_id individual_id: keep if _n == 1 merge 1:1 household_id individual_id using "`f'" gen father = _merge == 3 sort household_id individual_id list, noobs sepby(household_id) *------------ end example -------------- On Mon, May 7, 2012 at 10:11 AM, Nick Cox <njcoxstata@gmail.com> wrote: > I'll focus on one point which intersects with my own suggestions. > > It's usually very worthwhile getting an individual identifier that > just goes 1 up within families or households. It need not supplant or > replace whatever was recorded. The first FAQ mentioned earlier in my > postings to this thread explains how to do that. > > Nick > > On Mon, May 7, 2012 at 2:35 PM, Eric Booth <eric.a.booth@gmail.com> wrote: >> >> On May 7, 2012, at 1:13 AM, Sumiko Hayasaka wrote: >> >>> Everything works out until I get to the "foreach" command. It says the >>> expression is too long [r(130)]. What should I do? >>> Thank you again! >> >> The r(130) error comes from the -inlist()- part of the -generate- command I showed because, at some point, it has too many elements. >> >> This means you have a lot of father_row* variables after the initial -reshape-, probably because you don't have individual_id's like {1,2,3…} like you show, but individual id's like {99998,99917,…} that are unique to all (or most) individual_id's. One way to get around this would be to generate individual_id's within the household using the -egen- function 'group()' or : >> >> bys household_id (individual_id): g i = _n >> >> and then using "i" in place of individual_id in my example (but, you'd need to remember to carry 'individual_id' through the -reshape-). >> >> That will get around the too many values issue assuming you don't have many hundreds of people in a household (inlist()'s limit appears to be 250 - though its not in -help limits- so I don't know if that limit is the same across all versions/flavors of Stata --I've got MP, and 250 is the limit I've encountered). >> >> Of course, NJC's examples with looping over individuals is resilient against this type of issue with my code, but I wanted to follow up to explain where/why my example failed. >> >> - Eric >> >> __ >> Eric A. Booth >> Public Policy Research Institute >> Texas A&M University >> ebooth@ppri.tamu.edu >> Office: +979.845.6754 >> >> >> On May 7, 2012, at 1:13 AM, Sumiko Hayasaka wrote: >> >>> Thanks Eric! >>> >>> Everything works out until I get to the "foreach" command. It says the >>> expression is too long [r(130)]. What should I do? >>> >>> Thank you again! >>> >>> >>> On Sun, May 6, 2012 at 11:44 PM, Eric Booth <eric.a.booth@gmail.com> wrote: >>>> <> >>>> >>>> ***************! >>>> clear >>>> inp household_id individual_id father_row >>>> 1011 1 . >>>> 1011 2 . >>>> 1011 3 1 >>>> 1011 4 1 >>>> >>>> 1012 1 2 >>>> 1012 2 . >>>> >>>> 1013 1 . >>>> 1013 2 . >>>> 1013 3 2 >>>> 1013 4 1 >>>> 1013 5 1 >>>> end >>>> >>>> >>>> levelsof individual_id, loc(a) >>>> reshape wide father_row, i(household_id) j(individual_id) >>>> ds father_row* >>>> loc checklist `r(varlist)' >>>> loc checklist:subinstr loc checklist " " ", " , all >>>> foreach n in `a' { >>>> g father`n' = cond(inlist(`n', `checklist'), 1, 0, .) >>>> } >>>> reshape long father_row father, i(household_id) j(individual_id) >>>> >>>> >>>> ***************! >>>> - Eric >>>> >>>> __ >>>> Eric A. Booth >>>> Public Policy Research Institute >>>> Texas A&M University >>>> ebooth@ppri.tamu.edu >>>> +979.845.6754 >>>> >>>> On May 6, 2012, at 10:34 PM, Sumiko Hayasaka wrote: >>>> >>>>> I am trying to generate a dummy variable, with information from a >>>>> household survey, which can tell if a member of the household is a >>>>> father or not. I have a household id, an individual id (per >>>>> household), and a variable that tells me which individual id is marked >>>>> as being a father (members of the family are asked if their father >>>>> lives in the household and to give their father's individual id). >>>>> Therefore, I need to assign a 1 at the row in which someone at the >>>>> household said that was a father. To illustrate this, the data is >>>>> something like this (I am trying to get the "father" variable): >>>>> >>>>> household_id individual_id father_row father >>>>> ------------------------------------------------------------------------ >>>>> 1011 1 . 1 >>>>> 1011 2 . 0 >>>>> 1011 3 1 0 >>>>> 1011 4 1 0 >>>>> >>>>> 1012 1 2 0 >>>>> 1012 2 . 1 >>>>> >>>>> 1013 1 . 1 >>>>> 1013 2 . 1 >>>>> 1013 3 2 0 >>>>> 1013 4 1 0 >>>>> 1013 5 1 0 >>>>> >>>>> >>>>> So, for example, members number 3 and 4 of household number 1011 >>>>> stated that their father is individual number 1 in that household. >>>>> This means that I have to put the 1 of "father" (meaning the household >>>>> member is a father) at the row where father_row indicates (no matter >>>>> how many times this is done). >>>>> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/