Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: pairing unpaired data [was: Re: st: any idea?]

From   Fernando Rios Avila <[email protected]>
To   [email protected]
Subject   Re: pairing unpaired data [was: Re: st: any idea?]
Date   Tue, 7 Jan 2014 14:37:56 -0500

Perhaps a direction you could follow is by using a near matching method.
Since you can separate the information in two datasets (namely left
and right), you can do so, and then "merge" them using the user
written program -nearmrg-.
That will give you a start point to match up your data, but you might
need to make further revisions to ensure that there are no duplicate

On Tue, Jan 7, 2014 at 2:27 PM, Nick Cox <[email protected]> wrote:
> Thanks for the details of your problem. I can't see that you have a
> method that is translatable into Stata code: your procedure is too
> vaguely specified. That need not stop other people suggesting methods.
> Nick
> [email protected]
> On 7 January 2014 19:20, Y.R.E. Retamal <[email protected]> wrote:
>> Dear Nick
>> Thanks a lot for your soon response. The method is no more than showed. I
>> have to add other variables like width and height for the same bone. So, if
>> three variables match, probably both bones would be from the same skeleton.
>> I would expect that many bones would not match between them, so I could
>> discard them being from the same skeleton. Problems would appear if e.g. a
>> right bone matches with more than one left bone. But at least I could
>> simplify the work and after I could focus on problematic cases.
>> Rodrigo
>> On 2014-01-07 18:49, Nick Cox wrote:
>>> I changed the thread title, which was not informative.
>>> You need a method. Some predictable pitfalls are that for some bones
>>> there is no acceptable match and that others there could be two or
>>> more acceptable matches. I don't think there is a canned solution
>>> independent of your spelling out what the method is.
>>> Nick
>>> [email protected]
>>> On 7 January 2014 18:20, Y.R.E. Retamal <[email protected]> wrote:
>>>> Thank you very much Eric and Nick for the advices.
>>>> I will try to give a clearer idea of what want to do:
>>>> For example I have the following database of human bones. I removed
>>>> missing
>>>> values of length for a better understanding:
>>>> id      type    side    length          id      type    side    length
>>>> 1       femur   left    18              21      humerus left    13
>>>> 2       femur   left    65.85           22      humerus left    56
>>>> 3       femur   left    69.1            23      humerus left    92
>>>> 4       femur   left    130             24      humerus left    126
>>>> 5       femur   left    131.2           25      humerus left    154
>>>> 6       femur   left    143             26      humerus left    170
>>>> 7       femur   left    145             27      humerus left    198
>>>> 8       femur   left    160             28      humerus left    228
>>>> 9       femur   left    183             29      humerus left    230
>>>> 10      femur   left    200             30      humerus left    232
>>>> 11      femur   right   28              31      humerus right   238
>>>> 12      femur   right   80              32      humerus right   10
>>>> 13      femur   right   96.5            33      humerus right   66
>>>> 14      femur   right   126             34      humerus right   123
>>>> 15      femur   right   127             35      humerus right   128
>>>> 16      femur   right   128             36      humerus right   143
>>>> 17      femur   right   138             37      humerus right   200
>>>> 18      femur   right   146             38      humerus right   228
>>>> 19      femur   right   148             39      humerus right   230
>>>> 20      femur   right   200             40      humerus right   241
>>>> These data belong to a commingled skeletal collection and some right
>>>> bones
>>>> (femurs and humerus respectively) should match with a left bone, but I do
>>>> not know which bones match. Following the idea that a right bone from a
>>>> same
>>>> skeleton should have the same length (approximately) with its respective
>>>> left bone, I want to subtract each right femur to each left femur, with
>>>> the
>>>> aim to find which right femur matches with a left femur, i.e. have the
>>>> same
>>>> or almost the same length, so the subtraction would be zero or near zero.
>>>> The same proceeding with the humerus (and other bones).
>>>> If you have any idea to perform this, please let me know.
>>>> Rodrigo
>>>> Best wishes
>>>> Rodrigo
>>>> On 2014-01-05 23:54, Nick Cox wrote:
>>>>> <>
>>>>> Eric Booth gives very good advice.
>>>>> Your problem with the link to the Stata Journal file you were directed
>>>>> to me may be just that you didn't step past the standard material
>>>>> bundled with every reprint file.
>>>>> Nick
>>>>> [email protected]
>>>>> On 5 January 2014 21:03, Eric Booth <[email protected]> wrote:
>>>>>> <>
>>>>>> The Stata Journal link you mention that Nick sent you works for me.
>>>>>> The
>>>>>> title of the article is "Stata tip 71: The problem of split identity,
>>>>>> or how
>>>>>> to group dyads” by Nick J. Cox, so maybe you can google that title if
>>>>>> your
>>>>>> browser isn’t navigating to it properly.
>>>>>> Your example dataset doesn’t align with your desired dataset.
>>>>>> How do we know what is x and what is j in the first 20 obs of your
>>>>>> example data (see below) (also note the Statalist FAQ about not sending
>>>>>> attachments) ?
>>>>>> You need some kind of identifier that ties, for example, obs or id 1
>>>>>> (even though it’s missing) to the other right side femur observation of
>>>>>> interest (is it id 7 or id 9 or ??).
>>>>>> **your example data:
>>>>>> id      type    side    length
>>>>>> 1       femur   right
>>>>>> 2       femur   left
>>>>>> 3       femur   right
>>>>>> 4       femur   left
>>>>>> 5       femur   right   373
>>>>>> 6       femur   left    416
>>>>>> 7       femur   right   138
>>>>>> 8       femur   left
>>>>>> 9       femur   right   270
>>>>>> 10      femur   left
>>>>>> 11      femur   left
>>>>>> 12      femur   right
>>>>>> 13      femur   left
>>>>>> 14      femur   right
>>>>>> 15      femur   left    281
>>>>>> 16      femur   right
>>>>>> 17      femur   left    160
>>>>>> 18      femur   left
>>>>>> 19      femur   right
>>>>>> 20      femur   left
>>>>>> We can’t just sort by ‘type’ and ‘side’ to get a dataset of the same
>>>>>> structure as you presented initially, so I think you need to provide
>>>>>> more
>>>>>> information about this.  (also, if the rule is, as you imply, to sort
>>>>>> by
>>>>>> type and side and then subtract every third observation from each other
>>>>>> then
>>>>>> what do we do with missing 'length' and missing ‘side’?)
>>>>>> If the rule is that id 1 and id 2 are a pair then whey does the
>>>>>> left/right ordering suddenly change starting around id 17?
>>>>>> - Eric
>>>>>> On Jan 5, 2014, at 2:46 PM, Y.R.E. Retamal <[email protected]> wrote:
>>>>>>> Dear Guys
>>>>>>> Some weeks ago, Red Owl and Nick helped me with some loops for my
>>>>>>> work.
>>>>>>> I have tried to run some suggestion in my dataset, but I had some
>>>>>>> difficulties.
>>>>>>> I give you the basic structure of my dataset and my question:
>>>>>>> I want to create some new variables containing the difference between
>>>>>>> the length of two individuals from different groups:
>>>>>>> id     side     length      newvar1       newvar2      newvar3
>>>>>>> 1      right      x           x-j           x-k          x-l
>>>>>>> 2      right      y           y-j           y-k          y-l
>>>>>>> 3      right      z           z-j           z-k          z-l
>>>>>>> 4      left       j           j-x           j-y          j-z
>>>>>>> 5      left       k           k-x           k-y          k-z
>>>>>>> 6      left       l           l-x           l-y          l-z
>>>>>>> Red Owl suggested me following this example:
>>>>>>>>>> *** BEGIN CODE ***
>>>>>>>>>> * Build demo data set.
>>>>>>>>>> clear
>>>>>>>>>> * Length is capitalized to distinguish from length().
>>>>>>>>>> input id str5(side) Length
>>>>>>>>>> 1 right 10
>>>>>>>>>> 2 right 15
>>>>>>>>>> 3 right 11
>>>>>>>>>> 4 left  13
>>>>>>>>>> 5 left  10
>>>>>>>>>> 6 left  12
>>>>>>>>>> end
>>>>>>>>>> gen byte newvar1 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>>  replace newvar1 = Length[`i'] - Length[4] in `i'
>>>>>>>>>>  }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>>  replace newvar1 = Length[`i'] - Length[1] in `i'
>>>>>>>>>>  }
>>>>>>>>>> gen byte newvar2 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>>  replace newvar2 = Length[`i'] - Length[5] in `i'
>>>>>>>>>>  }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>>  replace newvar2 = Length[`i'] - Length[2] in `i'
>>>>>>>>>>  }
>>>>>>>>>> gen byte newvar3 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>>  replace newvar3 = Length[`i'] - Length[6] in `i'
>>>>>>>>>>  }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>>  replace newvar3 = Length[`i'] - Length[3] in `i'
>>>>>>>>>>  }
>>>>>>>>>> list, noobs sep(0)
>>>>>>>>>> *** END CODE ***
>>>>>>> However, my dataset is much more longer and is difficult to perform
>>>>>>> it.
>>>>>>> I hope you can help me giving me more ideas.
>>>>>>> I send you an extract of my dataset in .xlsx format
>>>>>>> Also, the webpage suggested by Nick to review the discussion about the
>>>>>>> topic (
>>>>>>> redirects
>>>>>>> me to a non-sense file to download. Please give me the number of the
>>>>>>> journal
>>>>>>> to read the discussion.
>>>>>>> Happy new year to all of you
>>>>>>> Rodrigo
>>>>>>> On 2013-12-15 22:39, Y.R.E. Retamal wrote:
>>>>>>>> Dear Red Owl and Nick
>>>>>>>> Thank you very much for your response. The code works perfectly, just
>>>>>>>> as I need.
>>>>>>>> Best wishes
>>>>>>>> Rodrigo
>>>>>>>> On 2013-12-14 22:31, Nick Cox wrote:
>>>>>>>>> In addition to Red's helpful suggestions, note that technique for
>>>>>>>>> such
>>>>>>>>> paired data was discussed in
>>>>>>>>> which is publicly accessible. The problem is that the identifiers in
>>>>>>>>> Rodrigo's example appear to make little sense. How is Stata expected
>>>>>>>>> to know that 1 and 4, 2 and 5, 3 and 6 are paired? Perhaps the
>>>>>>>>> structure of the dataset is clearer in practice. If so, basic
>>>>>>>>> calculations are just a couple of lines or so.
>>>>>>>>> Nick
>>>>>>>>> [email protected]
>>>>>>>>> On 14 December 2013 15:33, Red Owl <[email protected]> wrote:
>>>>>>>>>> Rodrigo,
>>>>>>>>>> The following code demonstrates an approach with basic loops.
>>>>>>>>>> It could be made more efficient with a different loop
>>>>>>>>>> structure, but this approach may be more informative.
>>>>>>>>>> *** BEGIN CODE ***
>>>>>>>>>> * Build demo data set.
>>>>>>>>>> clear
>>>>>>>>>> * Length is capitalized to distinguish from length().
>>>>>>>>>> input id str5(side) Length
>>>>>>>>>> 1 right 10
>>>>>>>>>> 2 right 15
>>>>>>>>>> 3 right 11
>>>>>>>>>> 4 left  13
>>>>>>>>>> 5 left  10
>>>>>>>>>> 6 left  12
>>>>>>>>>> end
>>>>>>>>>> gen byte newvar1 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>>  replace newvar1 = Length[`i'] - Length[4] in `i'
>>>>>>>>>>  }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>>  replace newvar1 = Length[`i'] - Length[1] in `i'
>>>>>>>>>>  }
>>>>>>>>>> gen byte newvar2 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>>  replace newvar2 = Length[`i'] - Length[5] in `i'
>>>>>>>>>>  }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>>  replace newvar2 = Length[`i'] - Length[2] in `i'
>>>>>>>>>>  }
>>>>>>>>>> gen byte newvar3 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>>  replace newvar3 = Length[`i'] - Length[6] in `i'
>>>>>>>>>>  }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>>  replace newvar3 = Length[`i'] - Length[3] in `i'
>>>>>>>>>>  }
>>>>>>>>>> list, noobs sep(0)
>>>>>>>>>> *** END CODE ***
>>>>>>>>>> Good luck.
>>>>>>>>>> Red Owl
>>>>>>>>>> [email protected]
>>>>>>>>>>> Y.R.E. Retamal" <[email protected]> Sat, 14 Dec 2013 12:08:42:
>>>>>>>>>>> Dear list
>>>>>>>>>>> I am very complicated trying to perform an analysis using STATA
>>>>>>>>>>> and
>>>>>>>>>>> I
>>>>>>>>>> cannot find the way. Maybe you could help me. I want to create some
>>>>>>>>>> new
>>>>>>>>>> variables containing the difference between the length of two
>>>>>>>>>> individuals from different groups:
>>>>>>>>>>> id     side     length      newvar1       newvar2      newvar3
>>>>>>>>>>> 1      right      x           x-j           x-k          x-l
>>>>>>>>>>> 2      right      y           y-j           y-k          y-l
>>>>>>>>>>> 3      right      z           z-j           z-k          z-l
>>>>>>>>>>> 4      left       j           j-x           j-y          j-z
>>>>>>>>>>> 5      left       k           k-x           k-y          k-z
>>>>>>>>>>> 6      left       l           l-x           l-y          l-z
>>>>>>>>>>> I do not know if I do explain myself clearly, the individuals are
>>>>>>>>>> bones (clavicles, for example), so it is possible that some right
>>>>>>>>>> clavicles pair-match with left clavicles, following the idea that
>>>>>>>>>> an
>>>>>>>>>> individual has bone of similar length.
>>>>>>>>>>> Any help could bring me a light!
>>>>>>>>>>> Best wishes
>>>>>>>>>>> Rodrigo
>>>>>>>>>> *
>>>>>>>>>> *   For searches and help try:
>>>>>>>>>> *
>>>>>>>>>> *
>>>>>>>>>> *
>>>>>>>>> *
>>>>>>>>> *   For searches and help try:
>>>>>>>>> *
>>>>>>>>> *
>>>>>>>>> *
>>>>>>>> *
>>>>>>>> *   For searches and help try:
>>>>>>>> *
>>>>>>>> *
>>>>>>>> *
>>>>>>> <example.xlsx>
>>>>>> *
>>>>>> *   For searches and help try:
>>>>>> *
>>>>>> *
>>>>>> *
>>>>> *
>>>>> *   For searches and help try:
>>>>> *
>>>>> *
>>>>> *
>>>> *
>>>> *   For searches and help try:
>>>> *
>>>> *
>>>> *
>>> *
>>> *   For searches and help try:
>>> *
>>> *
>>> *
>> *
>> *   For searches and help try:
>> *
>> *
>> *
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index