Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Generating dummy variable with information of household survey from different observations
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Generating dummy variable with information of household survey from different observations
Date
Mon, 7 May 2012 15:11:03 +0100
I'll focus on one point which intersects with my own suggestions.
It's usually very worthwhile getting an individual identifier that
just goes 1 up within families or households. It need not supplant or
replace whatever was recorded. The first FAQ mentioned earlier in my
postings to this thread explains how to do that.
Nick
On Mon, May 7, 2012 at 2:35 PM, Eric Booth <[email protected]> wrote:
>
> On May 7, 2012, at 1:13 AM, Sumiko Hayasaka wrote:
>
>> Everything works out until I get to the "foreach" command. It says the
>> expression is too long [r(130)]. What should I do?
>> Thank you again!
>
> The r(130) error comes from the -inlist()- part of the -generate- command I showed because, at some point, it has too many elements.
>
> This means you have a lot of father_row* variables after the initial -reshape-, probably because you don't have individual_id's like {1,2,3…} like you show, but individual id's like {99998,99917,…} that are unique to all (or most) individual_id's. One way to get around this would be to generate individual_id's within the household using the -egen- function 'group()' or :
>
> bys household_id (individual_id): g i = _n
>
> and then using "i" in place of individual_id in my example (but, you'd need to remember to carry 'individual_id' through the -reshape-).
>
> That will get around the too many values issue assuming you don't have many hundreds of people in a household (inlist()'s limit appears to be 250 - though its not in -help limits- so I don't know if that limit is the same across all versions/flavors of Stata --I've got MP, and 250 is the limit I've encountered).
>
> Of course, NJC's examples with looping over individuals is resilient against this type of issue with my code, but I wanted to follow up to explain where/why my example failed.
>
> - Eric
>
> __
> Eric A. Booth
> Public Policy Research Institute
> Texas A&M University
> [email protected]
> Office: +979.845.6754
>
>
> On May 7, 2012, at 1:13 AM, Sumiko Hayasaka wrote:
>
>> Thanks Eric!
>>
>> Everything works out until I get to the "foreach" command. It says the
>> expression is too long [r(130)]. What should I do?
>>
>> Thank you again!
>>
>>
>> On Sun, May 6, 2012 at 11:44 PM, Eric Booth <[email protected]> wrote:
>>> <>
>>>
>>> ***************!
>>> clear
>>> inp household_id individual_id father_row
>>> 1011 1 .
>>> 1011 2 .
>>> 1011 3 1
>>> 1011 4 1
>>>
>>> 1012 1 2
>>> 1012 2 .
>>>
>>> 1013 1 .
>>> 1013 2 .
>>> 1013 3 2
>>> 1013 4 1
>>> 1013 5 1
>>> end
>>>
>>>
>>> levelsof individual_id, loc(a)
>>> reshape wide father_row, i(household_id) j(individual_id)
>>> ds father_row*
>>> loc checklist `r(varlist)'
>>> loc checklist:subinstr loc checklist " " ", " , all
>>> foreach n in `a' {
>>> g father`n' = cond(inlist(`n', `checklist'), 1, 0, .)
>>> }
>>> reshape long father_row father, i(household_id) j(individual_id)
>>>
>>>
>>> ***************!
>>> - Eric
>>>
>>> __
>>> Eric A. Booth
>>> Public Policy Research Institute
>>> Texas A&M University
>>> [email protected]
>>> +979.845.6754
>>>
>>> On May 6, 2012, at 10:34 PM, Sumiko Hayasaka wrote:
>>>
>>>> I am trying to generate a dummy variable, with information from a
>>>> household survey, which can tell if a member of the household is a
>>>> father or not. I have a household id, an individual id (per
>>>> household), and a variable that tells me which individual id is marked
>>>> as being a father (members of the family are asked if their father
>>>> lives in the household and to give their father's individual id).
>>>> Therefore, I need to assign a 1 at the row in which someone at the
>>>> household said that was a father. To illustrate this, the data is
>>>> something like this (I am trying to get the "father" variable):
>>>>
>>>> household_id individual_id father_row father
>>>> ------------------------------------------------------------------------
>>>> 1011 1 . 1
>>>> 1011 2 . 0
>>>> 1011 3 1 0
>>>> 1011 4 1 0
>>>>
>>>> 1012 1 2 0
>>>> 1012 2 . 1
>>>>
>>>> 1013 1 . 1
>>>> 1013 2 . 1
>>>> 1013 3 2 0
>>>> 1013 4 1 0
>>>> 1013 5 1 0
>>>>
>>>>
>>>> So, for example, members number 3 and 4 of household number 1011
>>>> stated that their father is individual number 1 in that household.
>>>> This means that I have to put the 1 of "father" (meaning the household
>>>> member is a father) at the row where father_row indicates (no matter
>>>> how many times this is done).
>>>>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/