Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Generating dummy variable with information of household survey from different observations

From	Robert Picard <[email protected]>
To	[email protected]
Subject	Re: st: Generating dummy variable with information of household survey from different observations
Date	Mon, 7 May 2012 10:59:17 -0400

Here's another way to do this using -merge-

*----------- begin example -------------
clear
inp household_id   individual_id   father_row
 1011     1    .
 1011     2    .
 1011     3    1
 1011     4    1
 1012     1     2
 1012     2     .
 1013      1    .
 1013      2    .
 1013      3    2
 1013      4    1
 1013      5    1
 end

tempfile f
save "`f'"

drop if mi(father_row)
keep household_id father_row
rename father_row individual_id
sort household_id individual_id
by household_id individual_id: keep if _n == 1
merge 1:1 household_id individual_id using "`f'"
gen father = _merge == 3
sort household_id individual_id
list, noobs sepby(household_id)
*------------ end example --------------


On Mon, May 7, 2012 at 10:11 AM, Nick Cox <[email protected]> wrote:
> I'll focus on one point which intersects with my own suggestions.
>
> It's usually very worthwhile getting an individual identifier that
> just goes 1 up within families or households. It need not supplant or
> replace whatever was recorded. The first FAQ mentioned earlier in my
> postings to this thread explains how to do that.
>
> Nick
>
> On Mon, May 7, 2012 at 2:35 PM, Eric Booth <[email protected]> wrote:
>>
>> On May 7, 2012, at 1:13 AM, Sumiko Hayasaka wrote:
>>
>>> Everything works out until I get to the "foreach" command. It says the
>>> expression is too long [r(130)]. What should I do?
>>> Thank you again!
>>
>> The r(130) error comes from the -inlist()- part of the -generate- command I showed because, at some point, it has too many elements.
>>
>> This means you have a lot of father_row* variables after the initial -reshape-, probably because you don't have individual_id's like {1,2,3…} like you show, but individual id's like {99998,99917,…} that are unique to all (or most) individual_id's.  One way to get around this would be to generate individual_id's within the household using the -egen- function 'group()' or :
>>
>>        bys household_id (individual_id): g i = _n
>>
>> and then using "i" in place of individual_id in my example (but, you'd need to remember to carry 'individual_id' through the -reshape-).
>>
>> That will get around the too many values issue assuming you don't have many hundreds of people in a household (inlist()'s limit appears to be 250 - though its not in -help limits- so I don't know if that limit is the same across all versions/flavors of Stata --I've got MP, and 250 is the limit I've encountered).
>>
>> Of course, NJC's examples with looping over individuals is resilient against this type of issue with my code, but I wanted to follow up to explain where/why my example failed.
>>
>> - Eric
>>
>> __
>> Eric A. Booth
>> Public Policy Research Institute
>> Texas A&M University
>> [email protected]
>> Office: +979.845.6754
>>
>>
>> On May 7, 2012, at 1:13 AM, Sumiko Hayasaka wrote:
>>
>>> Thanks Eric!
>>>
>>> Everything works out until I get to the "foreach" command. It says the
>>> expression is too long [r(130)]. What should I do?
>>>
>>> Thank you again!
>>>
>>>
>>> On Sun, May 6, 2012 at 11:44 PM, Eric Booth <[email protected]> wrote:
>>>> <>
>>>>
>>>> ***************!
>>>> clear
>>>> inp household_id   individual_id   father_row
>>>>  1011     1    .
>>>>  1011     2    .
>>>>  1011     3    1
>>>>  1011     4    1
>>>>
>>>>  1012     1     2
>>>>  1012     2     .
>>>>
>>>>  1013      1    .
>>>>  1013      2    .
>>>>  1013      3    2
>>>>  1013      4    1
>>>>  1013      5    1
>>>>  end
>>>>
>>>>
>>>> levelsof individual_id, loc(a)
>>>> reshape wide father_row, i(household_id) j(individual_id)
>>>> ds father_row*
>>>> loc checklist `r(varlist)'
>>>> loc checklist:subinstr loc checklist " " ", " , all
>>>> foreach n in `a' {
>>>>         g father`n' = cond(inlist(`n', `checklist'), 1, 0, .)
>>>>        }
>>>> reshape long father_row father, i(household_id) j(individual_id)
>>>>
>>>>
>>>> ***************!
>>>> - Eric
>>>>
>>>> __
>>>> Eric A. Booth
>>>> Public Policy Research Institute
>>>> Texas A&M University
>>>> [email protected]
>>>> +979.845.6754
>>>>
>>>> On May 6, 2012, at 10:34 PM, Sumiko Hayasaka wrote:
>>>>
>>>>> I am trying to generate a dummy variable, with information from a
>>>>> household survey, which can tell if a member of the household is a
>>>>> father or not. I have a household id, an individual id (per
>>>>> household), and a variable that tells me which individual id is marked
>>>>> as being a father (members of the family are asked if their father
>>>>> lives in the household and to give their father's individual id).
>>>>> Therefore, I need to assign a 1 at the row in which someone at the
>>>>> household said that was a father. To illustrate this, the data is
>>>>> something like this (I am trying to get the "father" variable):
>>>>>
>>>>> household_id   individual_id   father_row   father
>>>>> ------------------------------------------------------------------------
>>>>>  1011                 1                      .                1
>>>>>  1011                 2                      .                0
>>>>>  1011                 3                      1               0
>>>>>  1011                 4                      1               0
>>>>>
>>>>>  1012                 1                       2               0
>>>>>  1012                 2                       .                1
>>>>>
>>>>>  1013                  1                      .                1
>>>>>  1013                  2                      .                1
>>>>>  1013                  3                      2               0
>>>>>  1013                  4                      1               0
>>>>>  1013                  5                      1               0
>>>>>
>>>>>
>>>>> So, for example, members number 3 and 4 of household number 1011
>>>>> stated that their father is individual number 1 in that household.
>>>>> This means that I have to put the 1 of "father" (meaning the household
>>>>> member is a father) at the row where father_row indicates (no matter
>>>>> how many times this is done).
>>>>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Generating dummy variable with information of household survey from different observations
  - From: Sumiko Hayasaka <[email protected]>
- Re: st: Generating dummy variable with information of household survey from different observations
  - From: Eric Booth <[email protected]>
- Re: st: Generating dummy variable with information of household survey from different observations
  - From: Sumiko Hayasaka <[email protected]>
- Re: st: Generating dummy variable with information of household survey from different observations
  - From: Eric Booth <[email protected]>
- Re: st: Generating dummy variable with information of household survey from different observations
  - From: Nick Cox <[email protected]>

Prev by Date: [no subject]
Next by Date: Re: st: which -cmp- option to use for poisson model with count data?
Previous by thread: Re: st: Generating dummy variable with information of household survey from different observations
Next by thread: Re: st: Generating dummy variable with information of household survey from different observations
Index(es):
- Date
- Thread