Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Brent McSharry (ADHB)" <BrentM@adhb.govt.nz> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | RE: pairing unpaired data [was: Re: st: any idea?] |
Date | Wed, 8 Jan 2014 09:22:21 +1300 |
A program thrown together to do what I believe you want (although not automatically): program closestvalues, byable(recall) sortpreserve version 10.1 syntax varname [if] [in], Matchon(varlist min=1 max=1) Id(varlist min=1 max=1) marksample touse tempvar absdif qui gen `absdif' = . qui count if `touse' local obs `r(N)' if (`r(N)'>0) { forvalues i=1(1)`r(N)' { gsort -`touse' `id' qui replace `absdif' = cond(_n!=`i' & `matchon'!=`matchon'[`i'], abs(`varlist'[`i']-`varlist'),.) if `touse' di as res "closest variables to `=`id'[`i']'(`=`varlist'[`i']')" gsort -`touse' `absdif' li `id' `varlist' `absdif' in 1/2 if `touse', noo } } end save this in the ado/personal folder and then type bysort bone:closestvalues length, id(id) match(side) you will then get output for each bone like -> bone = femur closest variables to 1(18) +------------------------+ | id length __00000A | |------------------------| | 11 28 10 | | 12 80 62 | +------------------------+ it will list the closest2 matches for each bone. You will then have to make a table of which matches are acceptable to you (or modify the program to automatically assign a match when prespecified criteria are met eg a single record within 1%). This program is ugly/slow, but will hopefully speed up what you are trying to do. Brent McSharry MBBS BSc(med) FCICM(paed) Paediatric Intensivist Starship Children's Hospital Private Bag 92024 Auckland 1142 New Zealand -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox Sent: Wednesday, 8 January 2014 8:28 a.m. To: statalist@hsphsun2.harvard.edu Subject: Re: pairing unpaired data [was: Re: st: any idea?] Thanks for the details of your problem. I can't see that you have a method that is translatable into Stata code: your procedure is too vaguely specified. That need not stop other people suggesting methods. Nick njcoxstata@gmail.com On 7 January 2014 19:20, Y.R.E. Retamal <yrer2@cam.ac.uk> wrote: > Dear Nick > > Thanks a lot for your soon response. The method is no more than showed. I > have to add other variables like width and height for the same bone. So, if > three variables match, probably both bones would be from the same skeleton. > I would expect that many bones would not match between them, so I could > discard them being from the same skeleton. Problems would appear if e.g. a > right bone matches with more than one left bone. But at least I could > simplify the work and after I could focus on problematic cases. > > Rodrigo > > > > > > > > On 2014-01-07 18:49, Nick Cox wrote: >> >> I changed the thread title, which was not informative. >> >> You need a method. Some predictable pitfalls are that for some bones >> there is no acceptable match and that others there could be two or >> more acceptable matches. I don't think there is a canned solution >> independent of your spelling out what the method is. >> >> Nick >> njcoxstata@gmail.com >> >> >> On 7 January 2014 18:20, Y.R.E. Retamal <yrer2@cam.ac.uk> wrote: >>> >>> Thank you very much Eric and Nick for the advices. >>> >>> I will try to give a clearer idea of what want to do: >>> For example I have the following database of human bones. I removed >>> missing >>> values of length for a better understanding: >>> >>> id type side length id type side length >>> 1 femur left 18 21 humerus left 13 >>> 2 femur left 65.85 22 humerus left 56 >>> 3 femur left 69.1 23 humerus left 92 >>> 4 femur left 130 24 humerus left 126 >>> 5 femur left 131.2 25 humerus left 154 >>> 6 femur left 143 26 humerus left 170 >>> 7 femur left 145 27 humerus left 198 >>> 8 femur left 160 28 humerus left 228 >>> 9 femur left 183 29 humerus left 230 >>> 10 femur left 200 30 humerus left 232 >>> 11 femur right 28 31 humerus right 238 >>> 12 femur right 80 32 humerus right 10 >>> 13 femur right 96.5 33 humerus right 66 >>> 14 femur right 126 34 humerus right 123 >>> 15 femur right 127 35 humerus right 128 >>> 16 femur right 128 36 humerus right 143 >>> 17 femur right 138 37 humerus right 200 >>> 18 femur right 146 38 humerus right 228 >>> 19 femur right 148 39 humerus right 230 >>> 20 femur right 200 40 humerus right 241 >>> >>> These data belong to a commingled skeletal collection and some right >>> bones >>> (femurs and humerus respectively) should match with a left bone, but I do >>> not know which bones match. Following the idea that a right bone from a >>> same >>> skeleton should have the same length (approximately) with its respective >>> left bone, I want to subtract each right femur to each left femur, with >>> the >>> aim to find which right femur matches with a left femur, i.e. have the >>> same >>> or almost the same length, so the subtraction would be zero or near zero. >>> The same proceeding with the humerus (and other bones). >>> >>> If you have any idea to perform this, please let me know. >>> >>> Rodrigo >>> >>> >>> >>> Best wishes >>> >>> Rodrigo >>> >>> >>> >>> >>> >>> On 2014-01-05 23:54, Nick Cox wrote: >>>> >>>> >>>> <> >>>> >>>> Eric Booth gives very good advice. >>>> >>>> Your problem with the link to the Stata Journal file you were directed >>>> to me may be just that you didn't step past the standard material >>>> bundled with every reprint file. >>>> >>>> Nick >>>> njcoxstata@gmail.com >>>> >>>> >>>> On 5 January 2014 21:03, Eric Booth <eric.a.booth@gmail.com> wrote: >>>>> >>>>> >>>>> <> >>>>> >>>>> The Stata Journal link you mention that Nick sent you works for me. >>>>> The >>>>> title of the article is "Stata tip 71: The problem of split identity, >>>>> or how >>>>> to group dyads" by Nick J. Cox, so maybe you can google that title if >>>>> your >>>>> browser isn't navigating to it properly. >>>>> >>>>> >>>>> >>>>> Your example dataset doesn't align with your desired dataset. >>>>> >>>>> How do we know what is x and what is j in the first 20 obs of your >>>>> example data (see below) (also note the Statalist FAQ about not sending >>>>> attachments) ? >>>>> >>>>> You need some kind of identifier that ties, for example, obs or id 1 >>>>> (even though it's missing) to the other right side femur observation of >>>>> interest (is it id 7 or id 9 or ??). >>>>> >>>>> >>>>> **your example data: >>>>> >>>>> id type side length >>>>> 1 femur right >>>>> 2 femur left >>>>> 3 femur right >>>>> 4 femur left >>>>> 5 femur right 373 >>>>> 6 femur left 416 >>>>> 7 femur right 138 >>>>> 8 femur left >>>>> 9 femur right 270 >>>>> 10 femur left >>>>> 11 femur left >>>>> 12 femur right >>>>> 13 femur left >>>>> 14 femur right >>>>> 15 femur left 281 >>>>> 16 femur right >>>>> 17 femur left 160 >>>>> 18 femur left >>>>> 19 femur right >>>>> 20 femur left >>>>> >>>>> >>>>> We can't just sort by 'type' and 'side' to get a dataset of the same >>>>> structure as you presented initially, so I think you need to provide >>>>> more >>>>> information about this. (also, if the rule is, as you imply, to sort >>>>> by >>>>> type and side and then subtract every third observation from each other >>>>> then >>>>> what do we do with missing 'length' and missing 'side'?) >>>>> >>>>> If the rule is that id 1 and id 2 are a pair then whey does the >>>>> left/right ordering suddenly change starting around id 17? >>>>> >>>>> - Eric >>>>> >>>>> >>>>> >>>>> >>>>> On Jan 5, 2014, at 2:46 PM, Y.R.E. Retamal <yrer2@cam.ac.uk> wrote: >>>>> >>>>>> Dear Guys >>>>>> >>>>>> Some weeks ago, Red Owl and Nick helped me with some loops for my >>>>>> work. >>>>>> I have tried to run some suggestion in my dataset, but I had some >>>>>> difficulties. >>>>>> I give you the basic structure of my dataset and my question: >>>>>> >>>>>> I want to create some new variables containing the difference between >>>>>> the length of two individuals from different groups: >>>>>> >>>>>> id side length newvar1 newvar2 newvar3 >>>>>> 1 right x x-j x-k x-l >>>>>> 2 right y y-j y-k y-l >>>>>> 3 right z z-j z-k z-l >>>>>> 4 left j j-x j-y j-z >>>>>> 5 left k k-x k-y k-z >>>>>> 6 left l l-x l-y l-z >>>>>> >>>>>> Red Owl suggested me following this example: >>>>>> >>>>>>>>> *** BEGIN CODE *** >>>>>>>>> * Build demo data set. >>>>>>>>> clear >>>>>>>>> * Length is capitalized to distinguish from length(). >>>>>>>>> input id str5(side) Length >>>>>>>>> 1 right 10 >>>>>>>>> 2 right 15 >>>>>>>>> 3 right 11 >>>>>>>>> 4 left 13 >>>>>>>>> 5 left 10 >>>>>>>>> 6 left 12 >>>>>>>>> end >>>>>>>>> gen byte newvar1 = . >>>>>>>>> forval i = 1/3 { >>>>>>>>> replace newvar1 = Length[`i'] - Length[4] in `i' >>>>>>>>> } >>>>>>>>> forval i = 4/6 { >>>>>>>>> replace newvar1 = Length[`i'] - Length[1] in `i' >>>>>>>>> } >>>>>>>>> gen byte newvar2 = . >>>>>>>>> forval i = 1/3 { >>>>>>>>> replace newvar2 = Length[`i'] - Length[5] in `i' >>>>>>>>> } >>>>>>>>> forval i = 4/6 { >>>>>>>>> replace newvar2 = Length[`i'] - Length[2] in `i' >>>>>>>>> } >>>>>>>>> gen byte newvar3 = . >>>>>>>>> forval i = 1/3 { >>>>>>>>> replace newvar3 = Length[`i'] - Length[6] in `i' >>>>>>>>> } >>>>>>>>> forval i = 4/6 { >>>>>>>>> replace newvar3 = Length[`i'] - Length[3] in `i' >>>>>>>>> } >>>>>>>>> list, noobs sep(0) >>>>>>>>> *** END CODE *** >>>>>> >>>>>> >>>>>> >>>>>> However, my dataset is much more longer and is difficult to perform >>>>>> it. >>>>>> I hope you can help me giving me more ideas. >>>>>> I send you an extract of my dataset in .xlsx format >>>>>> Also, the webpage suggested by Nick to review the discussion about the >>>>>> topic (http://www.stata-journal.com/sjpdf.html?articlenum=dm0043) >>>>>> redirects >>>>>> me to a non-sense file to download. Please give me the number of the >>>>>> journal >>>>>> to read the discussion. >>>>>> >>>>>> Happy new year to all of you >>>>>> >>>>>> Rodrigo >>>>>> >>>>>> >>>>>> On 2013-12-15 22:39, Y.R.E. Retamal wrote: >>>>>>> >>>>>>> >>>>>>> Dear Red Owl and Nick >>>>>>> Thank you very much for your response. The code works perfectly, just >>>>>>> as I need. >>>>>>> Best wishes >>>>>>> Rodrigo >>>>>>> On 2013-12-14 22:31, Nick Cox wrote: >>>>>>>> >>>>>>>> >>>>>>>> In addition to Red's helpful suggestions, note that technique for >>>>>>>> such >>>>>>>> paired data was discussed in >>>>>>>> http://www.stata-journal.com/sjpdf.html?articlenum=dm0043 >>>>>>>> which is publicly accessible. The problem is that the identifiers in >>>>>>>> Rodrigo's example appear to make little sense. How is Stata expected >>>>>>>> to know that 1 and 4, 2 and 5, 3 and 6 are paired? Perhaps the >>>>>>>> structure of the dataset is clearer in practice. If so, basic >>>>>>>> calculations are just a couple of lines or so. >>>>>>>> Nick >>>>>>>> njcoxstata@gmail.com >>>>>>>> On 14 December 2013 15:33, Red Owl <rh.redowl@liu.edu> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Rodrigo, >>>>>>>>> The following code demonstrates an approach with basic loops. >>>>>>>>> It could be made more efficient with a different loop >>>>>>>>> structure, but this approach may be more informative. >>>>>>>>> *** BEGIN CODE *** >>>>>>>>> * Build demo data set. >>>>>>>>> clear >>>>>>>>> * Length is capitalized to distinguish from length(). >>>>>>>>> input id str5(side) Length >>>>>>>>> 1 right 10 >>>>>>>>> 2 right 15 >>>>>>>>> 3 right 11 >>>>>>>>> 4 left 13 >>>>>>>>> 5 left 10 >>>>>>>>> 6 left 12 >>>>>>>>> end >>>>>>>>> gen byte newvar1 = . >>>>>>>>> forval i = 1/3 { >>>>>>>>> replace newvar1 = Length[`i'] - Length[4] in `i' >>>>>>>>> } >>>>>>>>> forval i = 4/6 { >>>>>>>>> replace newvar1 = Length[`i'] - Length[1] in `i' >>>>>>>>> } >>>>>>>>> gen byte newvar2 = . >>>>>>>>> forval i = 1/3 { >>>>>>>>> replace newvar2 = Length[`i'] - Length[5] in `i' >>>>>>>>> } >>>>>>>>> forval i = 4/6 { >>>>>>>>> replace newvar2 = Length[`i'] - Length[2] in `i' >>>>>>>>> } >>>>>>>>> gen byte newvar3 = . >>>>>>>>> forval i = 1/3 { >>>>>>>>> replace newvar3 = Length[`i'] - Length[6] in `i' >>>>>>>>> } >>>>>>>>> forval i = 4/6 { >>>>>>>>> replace newvar3 = Length[`i'] - Length[3] in `i' >>>>>>>>> } >>>>>>>>> list, noobs sep(0) >>>>>>>>> *** END CODE *** >>>>>>>>> Good luck. >>>>>>>>> Red Owl >>>>>>>>> redowl@liu.edu >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Y.R.E. Retamal" <yrer2@cam.ac.uk> Sat, 14 Dec 2013 12:08:42: >>>>>>>>>> Dear list >>>>>>>>>> I am very complicated trying to perform an analysis using STATA >>>>>>>>>> and >>>>>>>>>> I >>>>>>>>> >>>>>>>>> >>>>>>>>> cannot find the way. Maybe you could help me. I want to create some >>>>>>>>> new >>>>>>>>> variables containing the difference between the length of two >>>>>>>>> individuals from different groups: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> id side length newvar1 newvar2 newvar3 >>>>>>>>>> 1 right x x-j x-k x-l >>>>>>>>>> 2 right y y-j y-k y-l >>>>>>>>>> 3 right z z-j z-k z-l >>>>>>>>>> 4 left j j-x j-y j-z >>>>>>>>>> 5 left k k-x k-y k-z >>>>>>>>>> 6 left l l-x l-y l-z >>>>>>>>>> I do not know if I do explain myself clearly, the individuals are >>>>>>>>> >>>>>>>>> >>>>>>>>> bones (clavicles, for example), so it is possible that some right >>>>>>>>> clavicles pair-match with left clavicles, following the idea that >>>>>>>>> an >>>>>>>>> individual has bone of similar length. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Any help could bring me a light! >>>>>>>>>> Best wishes >>>>>>>>>> Rodrigo >>>>>>>>> >>>>>>>>> >>>>>>>>> * >>>>>>>>> * For searches and help try: >>>>>>>>> * http://www.stata.com/help.cgi?search >>>>>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>>>>>>>> * http://www.ats.ucla.edu/stat/stata/ >>>>>>>> >>>>>>>> >>>>>>>> * >>>>>>>> * For searches and help try: >>>>>>>> * http://www.stata.com/help.cgi?search >>>>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>>>>>>> * http://www.ats.ucla.edu/stat/stata/ >>>>>>> >>>>>>> >>>>>>> * >>>>>>> * For searches and help try: >>>>>>> * http://www.stata.com/help.cgi?search >>>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>>>>>> * http://www.ats.ucla.edu/stat/stata/ >>>>>> >>>>>> >>>>>> <example.xlsx> >>>>> >>>>> >>>>> >>>>> >>>>> * >>>>> * For searches and help try: >>>>> * http://www.stata.com/help.cgi?search >>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>>>> * http://www.ats.ucla.edu/stat/stata/ >>>> >>>> >>>> >>>> * >>>> * For searches and help try: >>>> * http://www.stata.com/help.cgi?search >>>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>>> * http://www.ats.ucla.edu/stat/stata/ >>> >>> >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>> * http://www.ats.ucla.edu/stat/stata/ >> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/