Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: pairing unpaired data [was: Re: st: any idea?]

From	"Sarah Edgington" <sedging@ucla.edu>
To	<statalist@hsphsun2.harvard.edu>
Subject	RE: pairing unpaired data [was: Re: st: any idea?]
Date	Tue, 7 Jan 2014 12:36:48 -0800
Rodrigo,
This is a complicated problem because it requires doing a calculation for
each possible pair of left/right bones.  Depending on how many bones you
have, this could turn out to be quite cumbersome.

The near matching method Fernando suggests could work, but the fact that
you'll ultimately need to match on more than one dimension seems like it
might create problems.

How many bones of each type do you actually have?  If it's a relatively
small number (for example a few hundred each of left and right for each type
of bone) you may be able to just use a brute force method by creating a
 dataset with each possible combination of left and right bones.  You'd want
to do this separately by bone type.

For example, you might create a dataset of left femur measurements and a
dataset of right femur measurements.  You could then use joinby to create
all the possible combinations between the two.

This might look something like the code below (note that I've only input the
femur data here, but this code assumes you have other types as well).  Keep
in mind that this creates a dataset that has NrightXNleft observations.  For
large datasets this likely won't be possible.


clear
input id str10 type str5 side length 
 1 femur left 18 
 2 femur left 65.85 
 3 femur left 69.1 
 4 femur left 130 
 5 femur left 131.2 
 6 femur left 143 
 7 femur left 145 
 8 femur left 160 
 9 femur left 183 
 10 femur left 200 
 11 femur right 28 
 12 femur right 80 
 13 femur right 96.5 
 14 femur right 126 
 15 femur right 127 
 16 femur right 128 
 17 femur right 138 
 18 femur right 146 
 19 femur right 148 
 20 femur right 200 
 end

 
 keep if type=="femur"
 preserve
 keep if side=="left"
 rename length left_length
 rename id left_id
 drop side
 tempfile leftfemur
 save `leftfemur'
 
 restore
 keep if side=="right"
 rename length right_length
 rename id right_id
 drop side
 
 joinby type using `leftfemur'

 **you now have every possible pair of measurements
 gen lengthdiff=abs(right_length-left_length)

At this point you'll need very exact rules about what constitutes a match.
Once you've done that, that is still not the end of the task.  From there
you'll have to see how often you have bones that match multiple other bones.
Again, to do this you'll need to specify the exact rules about what is
"close enough" to consider it a possible match.  Then you'll need to come up
with rules for disambiguation.

This is not an elegant solution and if you have a lot of data it may not
work.  However, if you have few enough cases for this to work it has the
advantage of making it pretty easy to specify matching rules for multiple
measurements.

-Sarah 	




-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Fernando Rios
Avila
Sent: Tuesday, January 07, 2014 11:38 AM
To: statalist@hsphsun2.harvard.edu
Subject: Re: pairing unpaired data [was: Re: st: any idea?]

Rodrigo,
Perhaps a direction you could follow is by using a near matching method.
Since you can separate the information in two datasets (namely left and
right), you can do so, and then "merge" them using the user written program
-nearmrg-.
That will give you a start point to match up your data, but you might need
to make further revisions to ensure that there are no duplicate matching.
Best

On Tue, Jan 7, 2014 at 2:27 PM, Nick Cox <njcoxstata@gmail.com> wrote:
> Thanks for the details of your problem. I can't see that you have a 
> method that is translatable into Stata code: your procedure is too 
> vaguely specified. That need not stop other people suggesting methods.
> Nick
> njcoxstata@gmail.com
>
>
> On 7 January 2014 19:20, Y.R.E. Retamal <yrer2@cam.ac.uk> wrote:
>> Dear Nick
>>
>> Thanks a lot for your soon response. The method is no more than 
>> showed. I have to add other variables like width and height for the 
>> same bone. So, if three variables match, probably both bones would be
from the same skeleton.
>> I would expect that many bones would not match between them, so I 
>> could discard them being from the same skeleton. Problems would 
>> appear if e.g. a right bone matches with more than one left bone. But 
>> at least I could simplify the work and after I could focus on problematic
cases.
>>
>> Rodrigo
>>
>>
>>
>>
>>
>>
>>
>> On 2014-01-07 18:49, Nick Cox wrote:
>>>
>>> I changed the thread title, which was not informative.
>>>
>>> You need a method. Some predictable pitfalls are that for some bones 
>>> there is no acceptable match and that others there could be two or 
>>> more acceptable matches. I don't think there is a canned solution 
>>> independent of your spelling out what the method is.
>>>
>>> Nick
>>> njcoxstata@gmail.com
>>>
>>>
>>> On 7 January 2014 18:20, Y.R.E. Retamal <yrer2@cam.ac.uk> wrote:
>>>>
>>>> Thank you very much Eric and Nick for the advices.
>>>>
>>>> I will try to give a clearer idea of what want to do:
>>>> For example I have the following database of human bones. I removed 
>>>> missing values of length for a better understanding:
>>>>
>>>> id      type    side    length          id      type    side    length
>>>> 1       femur   left    18              21      humerus left    13
>>>> 2       femur   left    65.85           22      humerus left    56
>>>> 3       femur   left    69.1            23      humerus left    92
>>>> 4       femur   left    130             24      humerus left    126
>>>> 5       femur   left    131.2           25      humerus left    154
>>>> 6       femur   left    143             26      humerus left    170
>>>> 7       femur   left    145             27      humerus left    198
>>>> 8       femur   left    160             28      humerus left    228
>>>> 9       femur   left    183             29      humerus left    230
>>>> 10      femur   left    200             30      humerus left    232
>>>> 11      femur   right   28              31      humerus right   238
>>>> 12      femur   right   80              32      humerus right   10
>>>> 13      femur   right   96.5            33      humerus right   66
>>>> 14      femur   right   126             34      humerus right   123
>>>> 15      femur   right   127             35      humerus right   128
>>>> 16      femur   right   128             36      humerus right   143
>>>> 17      femur   right   138             37      humerus right   200
>>>> 18      femur   right   146             38      humerus right   228
>>>> 19      femur   right   148             39      humerus right   230
>>>> 20      femur   right   200             40      humerus right   241
>>>>
>>>> These data belong to a commingled skeletal collection and some 
>>>> right bones (femurs and humerus respectively) should match with a 
>>>> left bone, but I do not know which bones match. Following the idea 
>>>> that a right bone from a same skeleton should have the same length 
>>>> (approximately) with its respective left bone, I want to subtract 
>>>> each right femur to each left femur, with the aim to find which 
>>>> right femur matches with a left femur, i.e. have the same or almost 
>>>> the same length, so the subtraction would be zero or near zero.
>>>> The same proceeding with the humerus (and other bones).
>>>>
>>>> If you have any idea to perform this, please let me know.
>>>>
>>>> Rodrigo
>>>>
>>>>
>>>>
>>>> Best wishes
>>>>
>>>> Rodrigo
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 2014-01-05 23:54, Nick Cox wrote:
>>>>>
>>>>>
>>>>> <>
>>>>>
>>>>> Eric Booth gives very good advice.
>>>>>
>>>>> Your problem with the link to the Stata Journal file you were 
>>>>> directed to me may be just that you didn't step past the standard 
>>>>> material bundled with every reprint file.
>>>>>
>>>>> Nick
>>>>> njcoxstata@gmail.com
>>>>>
>>>>>
>>>>> On 5 January 2014 21:03, Eric Booth <eric.a.booth@gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>> <>
>>>>>>
>>>>>> The Stata Journal link you mention that Nick sent you works for me.
>>>>>> The
>>>>>> title of the article is "Stata tip 71: The problem of split 
>>>>>> identity, or how to group dyads" by Nick J. Cox, so maybe you can 
>>>>>> google that title if your browser isn't navigating to it 
>>>>>> properly.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Your example dataset doesn't align with your desired dataset.
>>>>>>
>>>>>> How do we know what is x and what is j in the first 20 obs of 
>>>>>> your example data (see below) (also note the Statalist FAQ about 
>>>>>> not sending
>>>>>> attachments) ?
>>>>>>
>>>>>> You need some kind of identifier that ties, for example, obs or 
>>>>>> id 1 (even though it's missing) to the other right side femur 
>>>>>> observation of interest (is it id 7 or id 9 or ??).
>>>>>>
>>>>>>
>>>>>> **your example data:
>>>>>>
>>>>>> id      type    side    length
>>>>>> 1       femur   right
>>>>>> 2       femur   left
>>>>>> 3       femur   right
>>>>>> 4       femur   left
>>>>>> 5       femur   right   373
>>>>>> 6       femur   left    416
>>>>>> 7       femur   right   138
>>>>>> 8       femur   left
>>>>>> 9       femur   right   270
>>>>>> 10      femur   left
>>>>>> 11      femur   left
>>>>>> 12      femur   right
>>>>>> 13      femur   left
>>>>>> 14      femur   right
>>>>>> 15      femur   left    281
>>>>>> 16      femur   right
>>>>>> 17      femur   left    160
>>>>>> 18      femur   left
>>>>>> 19      femur   right
>>>>>> 20      femur   left
>>>>>>
>>>>>>
>>>>>> We can't just sort by 'type' and 'side' to get a dataset of the 
>>>>>> same structure as you presented initially, so I think you need to 
>>>>>> provide more information about this.  (also, if the rule is, as 
>>>>>> you imply, to sort by type and side and then subtract every third 
>>>>>> observation from each other then what do we do with missing 
>>>>>> 'length' and missing 'side'?)
>>>>>>
>>>>>> If the rule is that id 1 and id 2 are a pair then whey does the 
>>>>>> left/right ordering suddenly change starting around id 17?
>>>>>>
>>>>>> - Eric
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Jan 5, 2014, at 2:46 PM, Y.R.E. Retamal <yrer2@cam.ac.uk> wrote:
>>>>>>
>>>>>>> Dear Guys
>>>>>>>
>>>>>>> Some weeks ago, Red Owl and Nick helped me with some loops for 
>>>>>>> my work.
>>>>>>> I have tried to run some suggestion in my dataset, but I had 
>>>>>>> some difficulties.
>>>>>>> I give you the basic structure of my dataset and my question:
>>>>>>>
>>>>>>> I want to create some new variables containing the difference 
>>>>>>> between the length of two individuals from different groups:
>>>>>>>
>>>>>>> id     side     length      newvar1       newvar2      newvar3
>>>>>>> 1      right      x           x-j           x-k          x-l
>>>>>>> 2      right      y           y-j           y-k          y-l
>>>>>>> 3      right      z           z-j           z-k          z-l
>>>>>>> 4      left       j           j-x           j-y          j-z
>>>>>>> 5      left       k           k-x           k-y          k-z
>>>>>>> 6      left       l           l-x           l-y          l-z
>>>>>>>
>>>>>>> Red Owl suggested me following this example:
>>>>>>>
>>>>>>>>>> *** BEGIN CODE ***
>>>>>>>>>> * Build demo data set.
>>>>>>>>>> clear
>>>>>>>>>> * Length is capitalized to distinguish from length().
>>>>>>>>>> input id str5(side) Length
>>>>>>>>>> 1 right 10
>>>>>>>>>> 2 right 15
>>>>>>>>>> 3 right 11
>>>>>>>>>> 4 left  13
>>>>>>>>>> 5 left  10
>>>>>>>>>> 6 left  12
>>>>>>>>>> end
>>>>>>>>>> gen byte newvar1 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>>  replace newvar1 = Length[`i'] - Length[4] in `i'
>>>>>>>>>>  }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>>  replace newvar1 = Length[`i'] - Length[1] in `i'
>>>>>>>>>>  }
>>>>>>>>>> gen byte newvar2 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>>  replace newvar2 = Length[`i'] - Length[5] in `i'
>>>>>>>>>>  }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>>  replace newvar2 = Length[`i'] - Length[2] in `i'
>>>>>>>>>>  }
>>>>>>>>>> gen byte newvar3 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>>  replace newvar3 = Length[`i'] - Length[6] in `i'
>>>>>>>>>>  }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>>  replace newvar3 = Length[`i'] - Length[3] in `i'
>>>>>>>>>>  }
>>>>>>>>>> list, noobs sep(0)
>>>>>>>>>> *** END CODE ***
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> However, my dataset is much more longer and is difficult to 
>>>>>>> perform it.
>>>>>>> I hope you can help me giving me more ideas.
>>>>>>> I send you an extract of my dataset in .xlsx format Also, the 
>>>>>>> webpage suggested by Nick to review the discussion about the 
>>>>>>> topic 
>>>>>>> (http://www.stata-journal.com/sjpdf.html?articlenum=dm0043)
>>>>>>> redirects
>>>>>>> me to a non-sense file to download. Please give me the number of 
>>>>>>> the journal to read the discussion.
>>>>>>>
>>>>>>> Happy new year to all of you
>>>>>>>
>>>>>>> Rodrigo
>>>>>>>
>>>>>>>
>>>>>>> On 2013-12-15 22:39, Y.R.E. Retamal wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Dear Red Owl and Nick
>>>>>>>> Thank you very much for your response. The code works 
>>>>>>>> perfectly, just as I need.
>>>>>>>> Best wishes
>>>>>>>> Rodrigo
>>>>>>>> On 2013-12-14 22:31, Nick Cox wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In addition to Red's helpful suggestions, note that technique 
>>>>>>>>> for such paired data was discussed in
>>>>>>>>> http://www.stata-journal.com/sjpdf.html?articlenum=dm0043
>>>>>>>>> which is publicly accessible. The problem is that the 
>>>>>>>>> identifiers in Rodrigo's example appear to make little sense. 
>>>>>>>>> How is Stata expected to know that 1 and 4, 2 and 5, 3 and 6 
>>>>>>>>> are paired? Perhaps the structure of the dataset is clearer in 
>>>>>>>>> practice. If so, basic calculations are just a couple of lines or
so.
>>>>>>>>> Nick
>>>>>>>>> njcoxstata@gmail.com
>>>>>>>>> On 14 December 2013 15:33, Red Owl <rh.redowl@liu.edu> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Rodrigo,
>>>>>>>>>> The following code demonstrates an approach with basic loops.
>>>>>>>>>> It could be made more efficient with a different loop 
>>>>>>>>>> structure, but this approach may be more informative.
>>>>>>>>>> *** BEGIN CODE ***
>>>>>>>>>> * Build demo data set.
>>>>>>>>>> clear
>>>>>>>>>> * Length is capitalized to distinguish from length().
>>>>>>>>>> input id str5(side) Length
>>>>>>>>>> 1 right 10
>>>>>>>>>> 2 right 15
>>>>>>>>>> 3 right 11
>>>>>>>>>> 4 left  13
>>>>>>>>>> 5 left  10
>>>>>>>>>> 6 left  12
>>>>>>>>>> end
>>>>>>>>>> gen byte newvar1 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>>  replace newvar1 = Length[`i'] - Length[4] in `i'
>>>>>>>>>>  }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>>  replace newvar1 = Length[`i'] - Length[1] in `i'
>>>>>>>>>>  }
>>>>>>>>>> gen byte newvar2 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>>  replace newvar2 = Length[`i'] - Length[5] in `i'
>>>>>>>>>>  }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>>  replace newvar2 = Length[`i'] - Length[2] in `i'
>>>>>>>>>>  }
>>>>>>>>>> gen byte newvar3 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>>  replace newvar3 = Length[`i'] - Length[6] in `i'
>>>>>>>>>>  }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>>  replace newvar3 = Length[`i'] - Length[3] in `i'
>>>>>>>>>>  }
>>>>>>>>>> list, noobs sep(0)
>>>>>>>>>> *** END CODE ***
>>>>>>>>>> Good luck.
>>>>>>>>>> Red Owl
>>>>>>>>>> redowl@liu.edu
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Y.R.E. Retamal" <yrer2@cam.ac.uk> Sat, 14 Dec 2013 12:08:42:
>>>>>>>>>>> Dear list
>>>>>>>>>>> I am very complicated trying to perform an analysis using 
>>>>>>>>>>> STATA and I
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> cannot find the way. Maybe you could help me. I want to create
some
>>>>>>>>>> new
>>>>>>>>>> variables containing the difference between the length of two
>>>>>>>>>> individuals from different groups:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> id     side     length      newvar1       newvar2      newvar3
>>>>>>>>>>> 1      right      x           x-j           x-k          x-l
>>>>>>>>>>> 2      right      y           y-j           y-k          y-l
>>>>>>>>>>> 3      right      z           z-j           z-k          z-l
>>>>>>>>>>> 4      left       j           j-x           j-y          j-z
>>>>>>>>>>> 5      left       k           k-x           k-y          k-z
>>>>>>>>>>> 6      left       l           l-x           l-y          l-z
>>>>>>>>>>> I do not know if I do explain myself clearly, the individuals
are
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> bones (clavicles, for example), so it is possible that some right
>>>>>>>>>> clavicles pair-match with left clavicles, following the idea that
>>>>>>>>>> an
>>>>>>>>>> individual has bone of similar length.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Any help could bring me a light!
>>>>>>>>>>> Best wishes
>>>>>>>>>>> Rodrigo
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *
>>>>>>>>>> *   For searches and help try:
>>>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *
>>>>>>>>> *   For searches and help try:
>>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>>>
>>>>>>>>
>>>>>>>> *
>>>>>>>> *   For searches and help try:
>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>>
>>>>>>>
>>>>>>> <example.xlsx>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *
>>>>>> *   For searches and help try:
>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>>
>>>>>
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>>
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
Follow-Ups:
- Re: pairing unpaired data [was: Re: st: any idea?]
  - From: Nick Winter <njgwinter@gmail.com>
References:
- pairing unpaired data [was: Re: st: any idea?]
  - From: Nick Cox <njcoxstata@gmail.com>
- Re: pairing unpaired data [was: Re: st: any idea?]
  - From: "Y.R.E. Retamal" <yrer2@cam.ac.uk>
- Re: pairing unpaired data [was: Re: st: any idea?]
  - From: Nick Cox <njcoxstata@gmail.com>
- Re: pairing unpaired data [was: Re: st: any idea?]
  - From: Fernando Rios Avila <f.rios.a@gmail.com>
Prev by Date: st: nullmat stata function on mataq
Next by Date: st: SSC Archive activity, Dec 2013
Previous by thread: Re: pairing unpaired data [was: Re: st: any idea?]
Next by thread: Re: pairing unpaired data [was: Re: st: any idea?]
Index(es):
- Date
- Thread