Double looping seems very inefficient if there are several thousand
observations. A simple merge as far as I can see would do the trick more
effectively. For that you would need a family identifier as well as an
individual identifier (it seems from the data below that the first bit of
the person ID is a family ID, but I'm not quite sure).
If your data is in the following form:
Family ID Person code Mother's code
0422082010 06
0422082010 14
...
0422082010 97 01
0422082010 98 01
you would:
1. drop the person code variable
2. rename the Mother's code as person code. Drop any cases where this code
is missing.
3. save this new data set under a new name (say temp.dta)
4. merge the *original* data set with this new data set (temp.dta) on the
family ID and person code
5. If there are no matches within a particular household you know that you
have a problem.
Hope this helps.
Martin
----- Original Message -----
From: "Zhiqiang Wang" <[email protected]>
To: <[email protected]>
Sent: Monday, November 25, 2002 9:29 AM
Subject: st: Re: A question
> Jisheng
> I am not sure how long double loops will take for your data. It works for
> the small sample you gave.
> ---------
> qui gen _find=0
> local N=_N
> foreach y of numlist 1/`N' {
> foreach x of numlist 1/`N' {
> qui replace _find=1 if mother_id[`y']==person_id[`x'] &
`y'==_n
> }
> }
> ---------
> Cheers
>
> Zhiqiang
> Menzies School of Health Research
>
>
> ----- Original Message -----
> From: Jisheng Cui
> To: [email protected]
> Sent: Monday, November 25, 2002 2:37 PM
> Subject: st: A question
>
>
> Following is a sample data with two columns indicating person's ID and
> mother's ID within a family. I would like to seek the best way to check
> whether each mother's ID is one of the person's IDs. Otherwise something
> wrong with the data. Please note:
> (1) We do not need to check the blank mother's ID.
> (2) There are some duplicate mother's ID in the family. If a mother's ID
is
> one of the person's ID, then we skip its duplicates.
> (3) There are thousands of such families. The program has to be efficient
in
> calculation. Be ware that -foreach- seems not work with the -by- command.
>
> With best wishes,
>
> Jisheng.
>
>
> person_id mother_id
>
> 042208201006
> 042208201014
> 042208201008
> 042208201099
> 042208201005
> 042208201007
> 042208201097 042208201001
> 042208201098 042208201001
> 042208201001 042208201002
> 042208201094 042208201005
> 042208201093 042208201005
> 042208201002 042208201005
> 042208201095 042208201005
> 042208201096 042208201005
> 042208201003 042208201007
> 042208201009 042208201007
> 042208201010 042208201007
> 042208201011 042208201007
> 042208201013 042208201011
> 042208201012 042208201011
>
>
>
> ----------------------------------------------------------------------
> Dr. Jisheng Cui
> Senior Research Fellow
> Centre for Genetic Epidemiology
> School of Population Health
> The University of Melbourne
> Parkville, Victoria 3010, Australia
> Tel: +61 3 8344-0641, Fax: +61 3 9349-5815
> URL: http://ariel.its.unimelb.edu.au/~jisc
> ----------------------------------------------------------------------
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/