Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Script searching data in a kinship matrix
From
Ginevra Biino <[email protected]>
To
[email protected]
Subject
st: Script searching data in a kinship matrix
Date
Wed, 14 Nov 2012 16:50:12 +0100
Hello everybody,
in a case-control study I have already sampled cases stratifying for sex
(0,1) and age (<62y, >=62y). I need to sample a group of controls with the
same characteristics (which I can easily do with the sample command) plus
one more: the level of relatedness. Therefore controls should be matched to
cases for sex (2 strata), age(2 strata) and, relatedness (less than a
certain level). In particular I need that controls are as least as possible
related to cases, for example each control should have a kinship coefficient
less than 0.0156 (i.e. 1/64 as for second cousins) with its matched case. In
the example data set below, there are the sampled cases [20 cases (where
disease==1): 5 subjects for each strata] and 200 possible controls
(disease==0) I have already sampled stratifying for age and sex (50 subjects
per strata).
id disease age sex
109530 0 65 M
109398 0 65 M
109494 0 65 M
110077 0 71 M
109601 0 66 M
109585 0 66 M
114262 0 73 M
109311 0 63 M
109355 0 64 M
110756 0 78 M
111090 0 81 M
110806 0 78 M
110222 0 72 M
110955 0 80 M
110310 0 73 M
109829 0 68 M
109434 0 64 M
110006 0 70 M
109286 0 63 M
109298 0 63 M
109721 0 67 M
110234 0 72 M
110143 0 71 M
133061 0 78 M
110021 0 69 M
110296 0 73 M
109719 0 67 M
110198 0 72 M
110115 0 71 M
110092 0 70 M
109296 0 63 M
109540 0 66 M
109791 0 68 M
109227 0 62 M
109807 0 68 M
109934 0 69 M
125715 0 73 M
109577 0 66 M
110677 0 76 M
111792 0 89 M
110414 0 74 M
109505 0 65 M
111257 0 82 M
109651 0 66 M
109552 0 66 M
109356 0 64 M
110641 0 76 M
109866 0 69 M
110749 0 78 M
110923 0 79 M
109316 0 63 F
105263 0 70 F
110843 0 78 F
109878 0 68 F
110941 0 79 F
111008 0 79 F
109403 0 64 F
110083 0 70 F
109778 0 68 F
109783 0 68 F
109325 0 63 F
109726 0 67 F
109958 0 69 F
110049 0 70 F
110736 0 77 F
114290 0 74 F
110791 0 78 F
111315 0 83 F
109431 0 64 F
114096 0 75 F
109784 0 68 F
110656 0 77 F
114678 0 74 F
32255 0 88 F
109253 0 63 F
133094 0 62 F
111251 0 82 F
109851 0 68 F
109221 0 62 F
109271 0 63 F
110264 0 72 F
109615 0 66 F
110557 0 75 F
110082 0 71 F
110278 0 72 F
110925 0 79 F
110347 0 73 F
109636 0 67 F
110271 0 72 F
109635 0 66 F
109621 0 66 F
110496 0 75 F
109295 0 63 F
110781 0 78 F
109281 0 62 F
110289 0 73 F
111491 0 85 F
109753 0 67 F
109181 0 62 F
110353 0 73 F
104532 0 36 M
105965 0 51 M
105866 0 50 M
105916 0 51 M
106358 0 56 M
103664 0 23 M
105618 0 48 M
109082 0 61 M
104572 0 36 M
105897 0 50 M
105090 0 41 M
108918 0 59 M
104758 0 38 M
103330 0 19 M
104390 0 34 M
109086 0 61 M
105198 0 43 M
104781 0 39 M
109128 0 61 M
105002 0 40 M
108946 0 60 M
133131 0 51 M
106058 0 52 M
115009 0 48 M
104740 0 38 M
132995 0 37 M
103309 0 18 M
103943 0 28 M
105747 0 49 M
103850 0 26 M
104824 0 38 M
104516 0 35 M
106423 0 56 M
105266 0 44 M
105117 0 41 M
104803 0 39 M
105642 0 48 M
108940 0 59 M
104982 0 41 M
105235 0 43 M
104839 0 39 M
104207 0 32 M
105097 0 42 M
104948 0 40 M
104218 0 31 M
104604 0 36 M
105565 0 47 M
104134 0 41 M
105059 0 41 M
104784 0 39 M
105131 0 42 F
115126 0 33 F
105417 0 45 F
103831 0 26 F
133091 0 35 F
106238 0 54 F
104724 0 38 F
105511 0 46 F
105438 0 46 F
103422 0 29 F
105142 0 42 F
103388 0 18 F
104384 0 34 F
105203 0 42 F
105023 0 41 F
105076 0 41 F
105691 0 48 F
104844 0 39 F
104674 0 37 F
104345 0 34 F
104736 0 39 F
103892 0 27 F
103909 0 27 F
1537 0 33 F
104562 0 36 F
108828 0 24 F
103814 0 26 F
105558 0 47 F
103556 0 23 F
109108 0 61 F
105179 0 43 F
104230 0 32 F
133036 0 45 F
104419 0 36 F
105475 0 46 F
103931 0 28 F
113829 0 48 F
133026 0 32 F
104542 0 35 F
104221 0 31 F
104510 0 36 F
105803 0 50 F
106489 0 57 F
105671 0 48 F
137828 0 24 F
104021 0 29 F
106195 0 54 F
133105 0 43 F
105425 0 45 F
104524 0 35 F
106258 1 55 M
105270 1 44 M
106661 1 59 M
104363 1 33 M
108982 1 60 M
106046 1 52 F
103359 1 18 F
105939 1 51 F
104152 1 31 F
105351 1 44 F
110363 1 73 M
114307 1 77 M
110790 1 78 M
109486 1 65 M
109317 1 64 M
114643 1 74 F
114057 1 75 F
109895 1 69 F
110775 1 77 F
110178 1 71 F
What I do not know is how to solve the relatedness problem. I have already
computed the kinship coefficients matrix of the extended pedigree to whom
the cases and controls in the example data belong. I do not provide it here
because is a 190X190 matrix. As an immediate example, such a kinship matrix
for 5 subjects (ID: 51, 59, 119, 156 and 178) is like:
51 59 119 156 178
51 0.500 0.000 0.000 0.000 0.000
59 0.000 0.500 0.250 0.000 0.250
119 0.000 0.250 0.500 0.000 0.250
156 0.000 0.000 0.000 0.500 0.000
178 0.000 0.250 0.250 0.000 0.500
In conclusion I need a script that looks down such kinship matrix searching
for controls satisfying the relatedness condition and that adds this
information to my data set (.
The information could be reported in many ways (the simplest to obtain it):
For example, the script may add to the original data set as many columns as
the maximum number of controls satisfying the relatedness condition for
their matched cases. In particular in correspondence of each case (rows for
which disease==1) the new columns should return the ID of the matched
control satisfying the condition: case-control kinship< 0.0156 ; otherwise a
zero (if condition is not satisfied), and finally a missing value (if the
subject's ID is not in the kinship matrix).
One other alternative may be that the script adds to the original data set
as many rows as the maximum number of controls satisfying the relatedness
condition for their matched cases. Such new rows (for each matched case)
may report the control's ID in the ID column, missing values in the disease,
age and sex columns, and the ID of the matched case in a new column.
Whatever alternative solution is welcome!
Does anybody can help me?
Ginevra
Ginevra Biino, PhD
Institute of Molecular Genetics, CNR
Via Abbiategrasso, 207
27100 Pavia, Italy
Tel +39 382 546363
Fax +39 382 422286
http://www.igm.cnr.it/
Ginevra Biino, PhD
Institute of Molecular Genetics, CNR
Via Abbiategrasso, 207
27100 Pavia, Italy
Tel +39 382 546363
Fax +39 382 422286
http://www.igm.cnr.it/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/