Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: pairing unpaired data [was: Re: st: any idea?]

From	"Brent McSharry (ADHB)" <[email protected]>
To	"[email protected]" <[email protected]>
Subject	RE: pairing unpaired data [was: Re: st: any idea?]
Date	Wed, 8 Jan 2014 09:22:21 +1300
A program thrown together to do what I believe you want (although not automatically):

program closestvalues, byable(recall) sortpreserve
version 10.1
syntax varname [if] [in], Matchon(varlist min=1 max=1) Id(varlist min=1 max=1)

        marksample touse
        tempvar absdif
        qui gen `absdif' = .

        qui count if `touse'
        local obs `r(N)'
        if (`r(N)'>0) {
                forvalues i=1(1)`r(N)' {
                        gsort -`touse' `id'
                        qui replace `absdif' = cond(_n!=`i' & `matchon'!=`matchon'[`i'], abs(`varlist'[`i']-`varlist'),.) if `touse'
                        di as res "closest variables to `=`id'[`i']'(`=`varlist'[`i']')"
                        gsort -`touse' `absdif'
                        li `id' `varlist' `absdif' in 1/2 if `touse', noo
                }
        }
end

save this in the ado/personal folder and then type
bysort  bone:closestvalues  length, id(id) match(side)

you will then get output for each bone like
-> bone = femur
closest variables to 1(18)

  +------------------------+
  | id   length   __00000A |
  |------------------------|
  | 11       28         10 |
  | 12       80         62 |
  +------------------------+

it will list the closest2 matches for each bone. You will then have to make a table of which matches are acceptable to you (or modify the program to automatically assign a match when prespecified criteria are met eg a single record within 1%).

This program is ugly/slow, but will hopefully speed up what you are trying to do.

Brent McSharry MBBS BSc(med) FCICM(paed)
Paediatric Intensivist
Starship Children's Hospital
Private Bag 92024
Auckland 1142
New Zealand
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
Sent: Wednesday, 8 January 2014 8:28 a.m.
To: [email protected]
Subject: Re: pairing unpaired data [was: Re: st: any idea?]

Thanks for the details of your problem. I can't see that you have a
method that is translatable into Stata code: your procedure is too
vaguely specified. That need not stop other people suggesting methods.
Nick
[email protected]


On 7 January 2014 19:20, Y.R.E. Retamal <[email protected]> wrote:
> Dear Nick
>
> Thanks a lot for your soon response. The method is no more than showed. I
> have to add other variables like width and height for the same bone. So, if
> three variables match, probably both bones would be from the same skeleton.
> I would expect that many bones would not match between them, so I could
> discard them being from the same skeleton. Problems would appear if e.g. a
> right bone matches with more than one left bone. But at least I could
> simplify the work and after I could focus on problematic cases.
>
> Rodrigo
>
>
>
>
>
>
>
> On 2014-01-07 18:49, Nick Cox wrote:
>>
>> I changed the thread title, which was not informative.
>>
>> You need a method. Some predictable pitfalls are that for some bones
>> there is no acceptable match and that others there could be two or
>> more acceptable matches. I don't think there is a canned solution
>> independent of your spelling out what the method is.
>>
>> Nick
>> [email protected]
>>
>>
>> On 7 January 2014 18:20, Y.R.E. Retamal <[email protected]> wrote:
>>>
>>> Thank you very much Eric and Nick for the advices.
>>>
>>> I will try to give a clearer idea of what want to do:
>>> For example I have the following database of human bones. I removed
>>> missing
>>> values of length for a better understanding:
>>>
>>> id      type    side    length          id      type    side    length
>>> 1       femur   left    18              21      humerus left    13
>>> 2       femur   left    65.85           22      humerus left    56
>>> 3       femur   left    69.1            23      humerus left    92
>>> 4       femur   left    130             24      humerus left    126
>>> 5       femur   left    131.2           25      humerus left    154
>>> 6       femur   left    143             26      humerus left    170
>>> 7       femur   left    145             27      humerus left    198
>>> 8       femur   left    160             28      humerus left    228
>>> 9       femur   left    183             29      humerus left    230
>>> 10      femur   left    200             30      humerus left    232
>>> 11      femur   right   28              31      humerus right   238
>>> 12      femur   right   80              32      humerus right   10
>>> 13      femur   right   96.5            33      humerus right   66
>>> 14      femur   right   126             34      humerus right   123
>>> 15      femur   right   127             35      humerus right   128
>>> 16      femur   right   128             36      humerus right   143
>>> 17      femur   right   138             37      humerus right   200
>>> 18      femur   right   146             38      humerus right   228
>>> 19      femur   right   148             39      humerus right   230
>>> 20      femur   right   200             40      humerus right   241
>>>
>>> These data belong to a commingled skeletal collection and some right
>>> bones
>>> (femurs and humerus respectively) should match with a left bone, but I do
>>> not know which bones match. Following the idea that a right bone from a
>>> same
>>> skeleton should have the same length (approximately) with its respective
>>> left bone, I want to subtract each right femur to each left femur, with
>>> the
>>> aim to find which right femur matches with a left femur, i.e. have the
>>> same
>>> or almost the same length, so the subtraction would be zero or near zero.
>>> The same proceeding with the humerus (and other bones).
>>>
>>> If you have any idea to perform this, please let me know.
>>>
>>> Rodrigo
>>>
>>>
>>>
>>> Best wishes
>>>
>>> Rodrigo
>>>
>>>
>>>
>>>
>>>
>>> On 2014-01-05 23:54, Nick Cox wrote:
>>>>
>>>>
>>>> <>
>>>>
>>>> Eric Booth gives very good advice.
>>>>
>>>> Your problem with the link to the Stata Journal file you were directed
>>>> to me may be just that you didn't step past the standard material
>>>> bundled with every reprint file.
>>>>
>>>> Nick
>>>> [email protected]
>>>>
>>>>
>>>> On 5 January 2014 21:03, Eric Booth <[email protected]> wrote:
>>>>>
>>>>>
>>>>> <>
>>>>>
>>>>> The Stata Journal link you mention that Nick sent you works for me.
>>>>> The
>>>>> title of the article is "Stata tip 71: The problem of split identity,
>>>>> or how
>>>>> to group dyads" by Nick J. Cox, so maybe you can google that title if
>>>>> your
>>>>> browser isn't navigating to it properly.
>>>>>
>>>>>
>>>>>
>>>>> Your example dataset doesn't align with your desired dataset.
>>>>>
>>>>> How do we know what is x and what is j in the first 20 obs of your
>>>>> example data (see below) (also note the Statalist FAQ about not sending
>>>>> attachments) ?
>>>>>
>>>>> You need some kind of identifier that ties, for example, obs or id 1
>>>>> (even though it's missing) to the other right side femur observation of
>>>>> interest (is it id 7 or id 9 or ??).
>>>>>
>>>>>
>>>>> **your example data:
>>>>>
>>>>> id      type    side    length
>>>>> 1       femur   right
>>>>> 2       femur   left
>>>>> 3       femur   right
>>>>> 4       femur   left
>>>>> 5       femur   right   373
>>>>> 6       femur   left    416
>>>>> 7       femur   right   138
>>>>> 8       femur   left
>>>>> 9       femur   right   270
>>>>> 10      femur   left
>>>>> 11      femur   left
>>>>> 12      femur   right
>>>>> 13      femur   left
>>>>> 14      femur   right
>>>>> 15      femur   left    281
>>>>> 16      femur   right
>>>>> 17      femur   left    160
>>>>> 18      femur   left
>>>>> 19      femur   right
>>>>> 20      femur   left
>>>>>
>>>>>
>>>>> We can't just sort by 'type' and 'side' to get a dataset of the same
>>>>> structure as you presented initially, so I think you need to provide
>>>>> more
>>>>> information about this.  (also, if the rule is, as you imply, to sort
>>>>> by
>>>>> type and side and then subtract every third observation from each other
>>>>> then
>>>>> what do we do with missing 'length' and missing 'side'?)
>>>>>
>>>>> If the rule is that id 1 and id 2 are a pair then whey does the
>>>>> left/right ordering suddenly change starting around id 17?
>>>>>
>>>>> - Eric
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Jan 5, 2014, at 2:46 PM, Y.R.E. Retamal <[email protected]> wrote:
>>>>>
>>>>>> Dear Guys
>>>>>>
>>>>>> Some weeks ago, Red Owl and Nick helped me with some loops for my
>>>>>> work.
>>>>>> I have tried to run some suggestion in my dataset, but I had some
>>>>>> difficulties.
>>>>>> I give you the basic structure of my dataset and my question:
>>>>>>
>>>>>> I want to create some new variables containing the difference between
>>>>>> the length of two individuals from different groups:
>>>>>>
>>>>>> id     side     length      newvar1       newvar2      newvar3
>>>>>> 1      right      x           x-j           x-k          x-l
>>>>>> 2      right      y           y-j           y-k          y-l
>>>>>> 3      right      z           z-j           z-k          z-l
>>>>>> 4      left       j           j-x           j-y          j-z
>>>>>> 5      left       k           k-x           k-y          k-z
>>>>>> 6      left       l           l-x           l-y          l-z
>>>>>>
>>>>>> Red Owl suggested me following this example:
>>>>>>
>>>>>>>>> *** BEGIN CODE ***
>>>>>>>>> * Build demo data set.
>>>>>>>>> clear
>>>>>>>>> * Length is capitalized to distinguish from length().
>>>>>>>>> input id str5(side) Length
>>>>>>>>> 1 right 10
>>>>>>>>> 2 right 15
>>>>>>>>> 3 right 11
>>>>>>>>> 4 left  13
>>>>>>>>> 5 left  10
>>>>>>>>> 6 left  12
>>>>>>>>> end
>>>>>>>>> gen byte newvar1 = .
>>>>>>>>> forval i = 1/3 {
>>>>>>>>>  replace newvar1 = Length[`i'] - Length[4] in `i'
>>>>>>>>>  }
>>>>>>>>> forval i = 4/6 {
>>>>>>>>>  replace newvar1 = Length[`i'] - Length[1] in `i'
>>>>>>>>>  }
>>>>>>>>> gen byte newvar2 = .
>>>>>>>>> forval i = 1/3 {
>>>>>>>>>  replace newvar2 = Length[`i'] - Length[5] in `i'
>>>>>>>>>  }
>>>>>>>>> forval i = 4/6 {
>>>>>>>>>  replace newvar2 = Length[`i'] - Length[2] in `i'
>>>>>>>>>  }
>>>>>>>>> gen byte newvar3 = .
>>>>>>>>> forval i = 1/3 {
>>>>>>>>>  replace newvar3 = Length[`i'] - Length[6] in `i'
>>>>>>>>>  }
>>>>>>>>> forval i = 4/6 {
>>>>>>>>>  replace newvar3 = Length[`i'] - Length[3] in `i'
>>>>>>>>>  }
>>>>>>>>> list, noobs sep(0)
>>>>>>>>> *** END CODE ***
>>>>>>
>>>>>>
>>>>>>
>>>>>> However, my dataset is much more longer and is difficult to perform
>>>>>> it.
>>>>>> I hope you can help me giving me more ideas.
>>>>>> I send you an extract of my dataset in .xlsx format
>>>>>> Also, the webpage suggested by Nick to review the discussion about the
>>>>>> topic (http://www.stata-journal.com/sjpdf.html?articlenum=dm0043)
>>>>>> redirects
>>>>>> me to a non-sense file to download. Please give me the number of the
>>>>>> journal
>>>>>> to read the discussion.
>>>>>>
>>>>>> Happy new year to all of you
>>>>>>
>>>>>> Rodrigo
>>>>>>
>>>>>>
>>>>>> On 2013-12-15 22:39, Y.R.E. Retamal wrote:
>>>>>>>
>>>>>>>
>>>>>>> Dear Red Owl and Nick
>>>>>>> Thank you very much for your response. The code works perfectly, just
>>>>>>> as I need.
>>>>>>> Best wishes
>>>>>>> Rodrigo
>>>>>>> On 2013-12-14 22:31, Nick Cox wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> In addition to Red's helpful suggestions, note that technique for
>>>>>>>> such
>>>>>>>> paired data was discussed in
>>>>>>>> http://www.stata-journal.com/sjpdf.html?articlenum=dm0043
>>>>>>>> which is publicly accessible. The problem is that the identifiers in
>>>>>>>> Rodrigo's example appear to make little sense. How is Stata expected
>>>>>>>> to know that 1 and 4, 2 and 5, 3 and 6 are paired? Perhaps the
>>>>>>>> structure of the dataset is clearer in practice. If so, basic
>>>>>>>> calculations are just a couple of lines or so.
>>>>>>>> Nick
>>>>>>>> [email protected]
>>>>>>>> On 14 December 2013 15:33, Red Owl <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Rodrigo,
>>>>>>>>> The following code demonstrates an approach with basic loops.
>>>>>>>>> It could be made more efficient with a different loop
>>>>>>>>> structure, but this approach may be more informative.
>>>>>>>>> *** BEGIN CODE ***
>>>>>>>>> * Build demo data set.
>>>>>>>>> clear
>>>>>>>>> * Length is capitalized to distinguish from length().
>>>>>>>>> input id str5(side) Length
>>>>>>>>> 1 right 10
>>>>>>>>> 2 right 15
>>>>>>>>> 3 right 11
>>>>>>>>> 4 left  13
>>>>>>>>> 5 left  10
>>>>>>>>> 6 left  12
>>>>>>>>> end
>>>>>>>>> gen byte newvar1 = .
>>>>>>>>> forval i = 1/3 {
>>>>>>>>>  replace newvar1 = Length[`i'] - Length[4] in `i'
>>>>>>>>>  }
>>>>>>>>> forval i = 4/6 {
>>>>>>>>>  replace newvar1 = Length[`i'] - Length[1] in `i'
>>>>>>>>>  }
>>>>>>>>> gen byte newvar2 = .
>>>>>>>>> forval i = 1/3 {
>>>>>>>>>  replace newvar2 = Length[`i'] - Length[5] in `i'
>>>>>>>>>  }
>>>>>>>>> forval i = 4/6 {
>>>>>>>>>  replace newvar2 = Length[`i'] - Length[2] in `i'
>>>>>>>>>  }
>>>>>>>>> gen byte newvar3 = .
>>>>>>>>> forval i = 1/3 {
>>>>>>>>>  replace newvar3 = Length[`i'] - Length[6] in `i'
>>>>>>>>>  }
>>>>>>>>> forval i = 4/6 {
>>>>>>>>>  replace newvar3 = Length[`i'] - Length[3] in `i'
>>>>>>>>>  }
>>>>>>>>> list, noobs sep(0)
>>>>>>>>> *** END CODE ***
>>>>>>>>> Good luck.
>>>>>>>>> Red Owl
>>>>>>>>> [email protected]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Y.R.E. Retamal" <[email protected]> Sat, 14 Dec 2013 12:08:42:
>>>>>>>>>> Dear list
>>>>>>>>>> I am very complicated trying to perform an analysis using STATA
>>>>>>>>>> and
>>>>>>>>>> I
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> cannot find the way. Maybe you could help me. I want to create some
>>>>>>>>> new
>>>>>>>>> variables containing the difference between the length of two
>>>>>>>>> individuals from different groups:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> id     side     length      newvar1       newvar2      newvar3
>>>>>>>>>> 1      right      x           x-j           x-k          x-l
>>>>>>>>>> 2      right      y           y-j           y-k          y-l
>>>>>>>>>> 3      right      z           z-j           z-k          z-l
>>>>>>>>>> 4      left       j           j-x           j-y          j-z
>>>>>>>>>> 5      left       k           k-x           k-y          k-z
>>>>>>>>>> 6      left       l           l-x           l-y          l-z
>>>>>>>>>> I do not know if I do explain myself clearly, the individuals are
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> bones (clavicles, for example), so it is possible that some right
>>>>>>>>> clavicles pair-match with left clavicles, following the idea that
>>>>>>>>> an
>>>>>>>>> individual has bone of similar length.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Any help could bring me a light!
>>>>>>>>>> Best wishes
>>>>>>>>>> Rodrigo
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *
>>>>>>>>> *   For searches and help try:
>>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>>>
>>>>>>>>
>>>>>>>> *
>>>>>>>> *   For searches and help try:
>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>>
>>>>>>>
>>>>>>> *
>>>>>>> *   For searches and help try:
>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>
>>>>>>
>>>>>> <example.xlsx>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>>
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
References:
- pairing unpaired data [was: Re: st: any idea?]
  - From: Nick Cox <[email protected]>
- Re: pairing unpaired data [was: Re: st: any idea?]
  - From: "Y.R.E. Retamal" <[email protected]>
- Re: pairing unpaired data [was: Re: st: any idea?]
  - From: Nick Cox <[email protected]>
Prev by Date: Re: pairing unpaired data [was: Re: st: any idea?]
Next by Date: st: nullmat stata function on mataq
Previous by thread: Re: pairing unpaired data [was: Re: st: any idea?]
Next by thread: st: nullmat stata function on mataq
Index(es):
- Date
- Thread