Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: how to generate parent variables matched to their children in household level data set?
From
Haena Lee <[email protected]>
To
[email protected]
Subject
st: how to generate parent variables matched to their children in household level data set?
Date
Sat, 23 Feb 2013 02:36:50 -0600
Nick,
I would love to merge father's and mother's data with children. That
was my first choice.
As you may have noticed, however, my data doesn't have one clear
indicator variable of who is mother/father/child/grandparent. Although
there are ID_F and ID_M, what makes me confused is, ID_F and ID_M are
on the same row of children. I see "fid and mid" from your previous
answer is also located on children's row. So how do I tell stata to
generate a new indicator of "mothers" and to treat it as a property of
mothers, not children? So that eventually I would extract moms from
this raw data (e.g., keep ID BMI_M EMP_M if mom==1) and merge (1:many)
it based on key variable (ID_fam) with children's data?
Assuming looping would do this work,
gen mom=.
unab Y: ID
unab Z: ID_M
forevar x of newlist mom
replace `x' ==1 if Y==Z
}
Please note that I am not familiar with the concept of looping. Just
taught myself today for a little bit so I am not sure if those
commands above would make sense. If not, let me know. I'd happy to
explain it again.
Haena
On Fri, Feb 22, 2013 at 7:54 PM, Nick Cox <[email protected]> wrote:
> Note that I wrote that FAQ some years ago. Now I think why didn't I
> approach that as a -merge- problem? Create a dataset with fathers'
> data, one with mothers' data, and -merge- using those. There is still
> some fiddling around. This all goes with the simple idea that we have
> favourite tools.
>
> Nick
>
> On Sat, Feb 23, 2013 at 1:50 AM, Nick Cox <[email protected]> wrote:
>> That's an allusion is to my FAQ
>>
>> FAQ . . Creating variables recording prop. of the other members of a group
>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
>> 4/05 How do I create variables summarizing for each
>> individual properties of the other members of a
>> group?
>>
>> http://www.stata.com/support/faqs/data-management/creating-variables-recording-properties/
>>
>> I don't know why you report problems. The code suggested there works
>> as intended. Here it is again run on your example data:
>>
>> . by ID_fam (ID), sort: gen pid = _n
>>
>> . gen byte fid = .
>> (7 missing values generated)
>>
>> . gen byte mid = .
>> (7 missing values generated)
>>
>> . summarize pid, meanonly
>>
>> . forval i = 1 / `r(max)' {
>> 2. by ID_fam: replace fid = `i' if ID_F == ID[`i'] &
>> !missing(ID_F)
>> 3. by ID_fam: replace mid = `i' if ID_M == ID[`i'] &
>> !missing(ID_M)
>> 4. }
>> (3 real changes made)
>> (0 real changes made)
>> (0 real changes made)
>> (3 real changes made)
>> (0 real changes made)
>> (0 real changes made)
>> (0 real changes made)
>> (0 real changes made)
>>
>> . l
>>
>> +----------------------------------------------------------------------------------+
>> | ID_F ID_M BMI ID ID_fam Emp
>> pid fid mid |
>> |----------------------------------------------------------------------------------|
>> 1. | 26.501 A901963701 A9019637 1
>> 1 . . |
>> 2. | 20.483 A901963702 A9019637 1
>> 2 . . |
>> 3. | A901963701 A901963702 20.924 A901963703 A9019637 .
>> 3 1 2 |
>> 4. | 27.209 A901963801 A9019638 1
>> 1 . . |
>> 5. | 31.733 A901963802 A9019638 .
>> 2 . . |
>> |----------------------------------------------------------------------------------|
>> 6. | A901963801 A901963802 18.018 A901963803 A9019638 .
>> 3 1 2 |
>> 7. | A901963801 A901963802 19.054 A901963804 A9019638 .
>> 4 1 2 |
>> +----------------------------------------------------------------------------------+
>>
>> Using the same logic, we copy parents' employment and mothers' BMI as desired:
>>
>> . gen BMI_M = .
>> (7 missing values generated)
>>
>> . gen Emp_M = .
>> (7 missing values generated)
>>
>> . gen Emp_F = .
>> (7 missing values generated)
>>
>> . summarize pid, meanonly
>>
>> . forval i = 1 / `r(max)' {
>> 2. by ID_fam: replace BMI_M = BMI[`i'] if ID_M == ID[`i'] & !missing(ID_M)
>> 3. by ID_fam: replace Emp_M = Emp[`i'] if ID_M == ID[`i'] & !missing(ID_M)
>> 4. by ID_fam: replace Emp_F = Emp[`i'] if ID_F == ID[`i'] & !missing(ID_F)
>> 5. }
>> (0 real changes made)
>> (0 real changes made)
>> (3 real changes made)
>> (3 real changes made)
>> (1 real change made)
>> (0 real changes made)
>> (0 real changes made)
>> (0 real changes made)
>> (0 real changes made)
>> (0 real changes made)
>> (0 real changes made)
>> (0 real changes made)
>>
>>
>> Here are the results:
>>
>> . l
>>
>> +-----------------------------------------------------------------------------------------------+
>> | ID_F ID_M BMI ID ID_fam Emp
>> pid BMI_M Emp_M Emp_F |
>> |-----------------------------------------------------------------------------------------------|
>> 1. | 26.501 A901963701 A9019637 1
>> 1 . . . |
>> 2. | 20.483 A901963702 A9019637 1
>> 2 . . . |
>> 3. | A901963701 A901963702 20.924 A901963703 A9019637 .
>> 3 20.483 1 1 |
>> 4. | 27.209 A901963801 A9019638 1
>> 1 . . . |
>> 5. | 31.733 A901963802 A9019638 .
>> 2 . . . |
>> |-----------------------------------------------------------------------------------------------|
>> 6. | A901963801 A901963802 18.018 A901963803 A9019638 .
>> 3 31.733 . 1 |
>> 7. | A901963801 A901963802 19.054 A901963804 A9019638 .
>> 4 31.733 . 1 |
>> +-----------------------------------------------------------------------------------------------+
>>
>> Nick
>>
>> On Fri, Feb 22, 2013 at 10:45 PM, Haena Lee <[email protected]> wrote:
>>
>>> I am working on investigating the relationship between maternal
>>> employment status and prevalence of childhood obesity using a
>>> nationally representative data (KNHANES). Suppose I have ID(all
>>> observations including both children and parents), ID_fam (household
>>> indicator),
>>> ID_F( father's ID), ID_M (mother's ID), BMI (body mass index) and
>>> finally Emp (employment status 1 if employed; 0 if non-employed) as
>>> the following;
>>>
>>> ID_F ID_M BMI ID ID_fam Emp
>>> 26.501 A901963701 A9019637 1
>>> 20.483 A901963702 A9019637 1
>>> A901963701 A901963702 20.924 A901963703 A9019637 .
>>> 27.209 A901963801 A9019638 1
>>> 31.733 A901963802 A9019638 .
>>> A901963801 A901963802 18.018 A901963803 A9019638 .
>>> A901963801 A901963802 19.054 A901963804 A9019638 .
>>>
>>> And ultimately, I would like to have a data set like this following;
>>>
>>> ID (children) ID_fam BMI Mom's Bmi Mom's Emp Dad's Emp
>>> A901963703 A9019637 20.924 20.483 1 1
>>> A901963803 A9019638 18.018 31.733 . 1
>>> A901963804 A9019638 19.054 31.733 . 1
>>>
>>> Given this, my question is 1) how to map the properties of other
>>> family members to children within each household, using loop, or 2)
>>> how to generate an indicator of mother (1 if ID == ID_M; 0 otherwise)?
>>> I found Nick Cox's helpful example and imitated it as the following;
>>>
>>> by ID_fam (ID), sort: gen pid = _n
>>> gen byte fid = .
>>> gen byte mid = .
>>> summarize pid, meanonly
>>> forval i = 1 / `r(max)' {
>>> by ID_fam: replace fid = `i'
>>> if ID_F == ID[`i'] & !missing(ID_F)
>>> by ID_fam: replace mid = `i'
>>> if ID_M == ID[`i'] & !missing(ID_M)
>>> }
>>>
>>> And it didn't produce any meaningful values but missing. Please
>>> advise. Thank you so much for any help in advance.
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
--
--------------------------------------
Haena Lee
Ph.D Student
Sociology Department
The University of Chicago
312 - 405 - 3223
--
=====================
Haena Lee
Ph.D Student
Sociology Department
The University of Chicago
312 - 405 - 3223
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/