Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: how to generate parent variables matched to their children in household level data set?
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: how to generate parent variables matched to their children in household level data set?
Date
Sat, 23 Feb 2013 09:33:13 +0000
I am at a loss to understand what you are asking. My previous posts
showed that with your sample data the code I used does work. It
remains a mystery why you first reported otherwise, and also why you
imply that the problem you stated is still unsolved. I just did that
for you. It seems that you have not studied my code and its results.
The absence of a single clear indicator variable is immaterial here.
You want to copy data from mothers' and fathers' observations to
children's; for that being able to link mother and father identifiers
to children is necessary and sufficient, and done separately.
My mention of -merge- just hints at a different method, but I have
given a method that works. I was not stating or implying that you need
to -merge-; that's merely a good alternative.
If you want to know why my method works you need to study not only
discussion of loops as in
SJ-2-2 pr0005 . . . . . . Speaking Stata: How to face lists with fortitude
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q2/02 SJ 2(2):202--222 (no commands)
demonstrates the usefulness of for, foreach, forvalues, and
local macros for interactive (non programming) tasks
but also the use of -by:- as in
SJ-2-1 pr0004 . . . . . . . . . . Speaking Stata: How to move step by: step
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q1/02 SJ 2(1):86--102 (no commands)
explains the use of the by varlist : construct to tackle
a variety of problems with group structure, ranging from
simple calculations for each of several groups to more
My code requires the fact that under the aegis of -by:- subscripts
(42 in -foo[42]- is a subscript) are numbered within groups, so the
subscript [1] refers to the first observation in each group.
As said, I don't see that you need any further code, so I have not
studied your code beyond noticing that -forevar- is not a Stata
command.
Nick
On Sat, Feb 23, 2013 at 8:36 AM, Haena Lee <[email protected]> wrote:
> Nick,
>
> I would love to merge father's and mother's data with children. That
> was my first choice.
> As you may have noticed, however, my data doesn't have one clear
> indicator variable of who is mother/father/child/grandparent. Although
> there are ID_F and ID_M, what makes me confused is, ID_F and ID_M are
> on the same row of children. I see "fid and mid" from your previous
> answer is also located on children's row. So how do I tell stata to
> generate a new indicator of "mothers" and to treat it as a property of
> mothers, not children? So that eventually I would extract moms from
> this raw data (e.g., keep ID BMI_M EMP_M if mom==1) and merge (1:many)
> it based on key variable (ID_fam) with children's data?
>
> Assuming looping would do this work,
>
> gen mom=.
> unab Y: ID
> unab Z: ID_M
> forevar x of newlist mom
> replace `x' ==1 if Y==Z
> }
>
> Please note that I am not familiar with the concept of looping. Just
> taught myself today for a little bit so I am not sure if those
> commands above would make sense. If not, let me know. I'd happy to
> explain it again.
>
> Haena
>
> On Fri, Feb 22, 2013 at 7:54 PM, Nick Cox <[email protected]> wrote:
>> Note that I wrote that FAQ some years ago. Now I think why didn't I
>> approach that as a -merge- problem? Create a dataset with fathers'
>> data, one with mothers' data, and -merge- using those. There is still
>> some fiddling around. This all goes with the simple idea that we have
>> favourite tools.
>>
>> Nick
>>
>> On Sat, Feb 23, 2013 at 1:50 AM, Nick Cox <[email protected]> wrote:
>>> That's an allusion is to my FAQ
>>>
>>> FAQ . . Creating variables recording prop. of the other members of a group
>>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
>>> 4/05 How do I create variables summarizing for each
>>> individual properties of the other members of a
>>> group?
>>>
>>> http://www.stata.com/support/faqs/data-management/creating-variables-recording-properties/
>>>
>>> I don't know why you report problems. The code suggested there works
>>> as intended. Here it is again run on your example data:
>>>
>>> . by ID_fam (ID), sort: gen pid = _n
>>>
>>> . gen byte fid = .
>>> (7 missing values generated)
>>>
>>> . gen byte mid = .
>>> (7 missing values generated)
>>>
>>> . summarize pid, meanonly
>>>
>>> . forval i = 1 / `r(max)' {
>>> 2. by ID_fam: replace fid = `i' if ID_F == ID[`i'] &
>>> !missing(ID_F)
>>> 3. by ID_fam: replace mid = `i' if ID_M == ID[`i'] &
>>> !missing(ID_M)
>>> 4. }
>>> (3 real changes made)
>>> (0 real changes made)
>>> (0 real changes made)
>>> (3 real changes made)
>>> (0 real changes made)
>>> (0 real changes made)
>>> (0 real changes made)
>>> (0 real changes made)
>>>
>>> . l
>>>
>>> +----------------------------------------------------------------------------------+
>>> | ID_F ID_M BMI ID ID_fam Emp
>>> pid fid mid |
>>> |----------------------------------------------------------------------------------|
>>> 1. | 26.501 A901963701 A9019637 1
>>> 1 . . |
>>> 2. | 20.483 A901963702 A9019637 1
>>> 2 . . |
>>> 3. | A901963701 A901963702 20.924 A901963703 A9019637 .
>>> 3 1 2 |
>>> 4. | 27.209 A901963801 A9019638 1
>>> 1 . . |
>>> 5. | 31.733 A901963802 A9019638 .
>>> 2 . . |
>>> |----------------------------------------------------------------------------------|
>>> 6. | A901963801 A901963802 18.018 A901963803 A9019638 .
>>> 3 1 2 |
>>> 7. | A901963801 A901963802 19.054 A901963804 A9019638 .
>>> 4 1 2 |
>>> +----------------------------------------------------------------------------------+
>>>
>>> Using the same logic, we copy parents' employment and mothers' BMI as desired:
>>>
>>> . gen BMI_M = .
>>> (7 missing values generated)
>>>
>>> . gen Emp_M = .
>>> (7 missing values generated)
>>>
>>> . gen Emp_F = .
>>> (7 missing values generated)
>>>
>>> . summarize pid, meanonly
>>>
>>> . forval i = 1 / `r(max)' {
>>> 2. by ID_fam: replace BMI_M = BMI[`i'] if ID_M == ID[`i'] & !missing(ID_M)
>>> 3. by ID_fam: replace Emp_M = Emp[`i'] if ID_M == ID[`i'] & !missing(ID_M)
>>> 4. by ID_fam: replace Emp_F = Emp[`i'] if ID_F == ID[`i'] & !missing(ID_F)
>>> 5. }
>>> (0 real changes made)
>>> (0 real changes made)
>>> (3 real changes made)
>>> (3 real changes made)
>>> (1 real change made)
>>> (0 real changes made)
>>> (0 real changes made)
>>> (0 real changes made)
>>> (0 real changes made)
>>> (0 real changes made)
>>> (0 real changes made)
>>> (0 real changes made)
>>>
>>>
>>> Here are the results:
>>>
>>> . l
>>>
>>> +-----------------------------------------------------------------------------------------------+
>>> | ID_F ID_M BMI ID ID_fam Emp
>>> pid BMI_M Emp_M Emp_F |
>>> |-----------------------------------------------------------------------------------------------|
>>> 1. | 26.501 A901963701 A9019637 1
>>> 1 . . . |
>>> 2. | 20.483 A901963702 A9019637 1
>>> 2 . . . |
>>> 3. | A901963701 A901963702 20.924 A901963703 A9019637 .
>>> 3 20.483 1 1 |
>>> 4. | 27.209 A901963801 A9019638 1
>>> 1 . . . |
>>> 5. | 31.733 A901963802 A9019638 .
>>> 2 . . . |
>>> |-----------------------------------------------------------------------------------------------|
>>> 6. | A901963801 A901963802 18.018 A901963803 A9019638 .
>>> 3 31.733 . 1 |
>>> 7. | A901963801 A901963802 19.054 A901963804 A9019638 .
>>> 4 31.733 . 1 |
>>> +-----------------------------------------------------------------------------------------------+
>>>
>>> Nick
>>>
>>> On Fri, Feb 22, 2013 at 10:45 PM, Haena Lee <[email protected]> wrote:
>>>
>>>> I am working on investigating the relationship between maternal
>>>> employment status and prevalence of childhood obesity using a
>>>> nationally representative data (KNHANES). Suppose I have ID(all
>>>> observations including both children and parents), ID_fam (household
>>>> indicator),
>>>> ID_F( father's ID), ID_M (mother's ID), BMI (body mass index) and
>>>> finally Emp (employment status 1 if employed; 0 if non-employed) as
>>>> the following;
>>>>
>>>> ID_F ID_M BMI ID ID_fam Emp
>>>> 26.501 A901963701 A9019637 1
>>>> 20.483 A901963702 A9019637 1
>>>> A901963701 A901963702 20.924 A901963703 A9019637 .
>>>> 27.209 A901963801 A9019638 1
>>>> 31.733 A901963802 A9019638 .
>>>> A901963801 A901963802 18.018 A901963803 A9019638 .
>>>> A901963801 A901963802 19.054 A901963804 A9019638 .
>>>>
>>>> And ultimately, I would like to have a data set like this following;
>>>>
>>>> ID (children) ID_fam BMI Mom's Bmi Mom's Emp Dad's Emp
>>>> A901963703 A9019637 20.924 20.483 1 1
>>>> A901963803 A9019638 18.018 31.733 . 1
>>>> A901963804 A9019638 19.054 31.733 . 1
>>>>
>>>> Given this, my question is 1) how to map the properties of other
>>>> family members to children within each household, using loop, or 2)
>>>> how to generate an indicator of mother (1 if ID == ID_M; 0 otherwise)?
>>>> I found Nick Cox's helpful example and imitated it as the following;
>>>>
>>>> by ID_fam (ID), sort: gen pid = _n
>>>> gen byte fid = .
>>>> gen byte mid = .
>>>> summarize pid, meanonly
>>>> forval i = 1 / `r(max)' {
>>>> by ID_fam: replace fid = `i'
>>>> if ID_F == ID[`i'] & !missing(ID_F)
>>>> by ID_fam: replace mid = `i'
>>>> if ID_M == ID[`i'] & !missing(ID_M)
>>>> }
>>>>
>>>> And it didn't produce any meaningful values but missing. Please
>>>> advise. Thank you so much for any help in advance.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/