Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: pairing unpaired data [was: Re: st: any idea?]
From
Fernando Rios Avila <[email protected]>
To
[email protected]
Subject
Re: pairing unpaired data [was: Re: st: any idea?]
Date
Tue, 7 Jan 2014 14:37:56 -0500
Rodrigo,
Perhaps a direction you could follow is by using a near matching method.
Since you can separate the information in two datasets (namely left
and right), you can do so, and then "merge" them using the user
written program -nearmrg-.
That will give you a start point to match up your data, but you might
need to make further revisions to ensure that there are no duplicate
matching.
Best
On Tue, Jan 7, 2014 at 2:27 PM, Nick Cox <[email protected]> wrote:
> Thanks for the details of your problem. I can't see that you have a
> method that is translatable into Stata code: your procedure is too
> vaguely specified. That need not stop other people suggesting methods.
> Nick
> [email protected]
>
>
> On 7 January 2014 19:20, Y.R.E. Retamal <[email protected]> wrote:
>> Dear Nick
>>
>> Thanks a lot for your soon response. The method is no more than showed. I
>> have to add other variables like width and height for the same bone. So, if
>> three variables match, probably both bones would be from the same skeleton.
>> I would expect that many bones would not match between them, so I could
>> discard them being from the same skeleton. Problems would appear if e.g. a
>> right bone matches with more than one left bone. But at least I could
>> simplify the work and after I could focus on problematic cases.
>>
>> Rodrigo
>>
>>
>>
>>
>>
>>
>>
>> On 2014-01-07 18:49, Nick Cox wrote:
>>>
>>> I changed the thread title, which was not informative.
>>>
>>> You need a method. Some predictable pitfalls are that for some bones
>>> there is no acceptable match and that others there could be two or
>>> more acceptable matches. I don't think there is a canned solution
>>> independent of your spelling out what the method is.
>>>
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 7 January 2014 18:20, Y.R.E. Retamal <[email protected]> wrote:
>>>>
>>>> Thank you very much Eric and Nick for the advices.
>>>>
>>>> I will try to give a clearer idea of what want to do:
>>>> For example I have the following database of human bones. I removed
>>>> missing
>>>> values of length for a better understanding:
>>>>
>>>> id type side length id type side length
>>>> 1 femur left 18 21 humerus left 13
>>>> 2 femur left 65.85 22 humerus left 56
>>>> 3 femur left 69.1 23 humerus left 92
>>>> 4 femur left 130 24 humerus left 126
>>>> 5 femur left 131.2 25 humerus left 154
>>>> 6 femur left 143 26 humerus left 170
>>>> 7 femur left 145 27 humerus left 198
>>>> 8 femur left 160 28 humerus left 228
>>>> 9 femur left 183 29 humerus left 230
>>>> 10 femur left 200 30 humerus left 232
>>>> 11 femur right 28 31 humerus right 238
>>>> 12 femur right 80 32 humerus right 10
>>>> 13 femur right 96.5 33 humerus right 66
>>>> 14 femur right 126 34 humerus right 123
>>>> 15 femur right 127 35 humerus right 128
>>>> 16 femur right 128 36 humerus right 143
>>>> 17 femur right 138 37 humerus right 200
>>>> 18 femur right 146 38 humerus right 228
>>>> 19 femur right 148 39 humerus right 230
>>>> 20 femur right 200 40 humerus right 241
>>>>
>>>> These data belong to a commingled skeletal collection and some right
>>>> bones
>>>> (femurs and humerus respectively) should match with a left bone, but I do
>>>> not know which bones match. Following the idea that a right bone from a
>>>> same
>>>> skeleton should have the same length (approximately) with its respective
>>>> left bone, I want to subtract each right femur to each left femur, with
>>>> the
>>>> aim to find which right femur matches with a left femur, i.e. have the
>>>> same
>>>> or almost the same length, so the subtraction would be zero or near zero.
>>>> The same proceeding with the humerus (and other bones).
>>>>
>>>> If you have any idea to perform this, please let me know.
>>>>
>>>> Rodrigo
>>>>
>>>>
>>>>
>>>> Best wishes
>>>>
>>>> Rodrigo
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 2014-01-05 23:54, Nick Cox wrote:
>>>>>
>>>>>
>>>>> <>
>>>>>
>>>>> Eric Booth gives very good advice.
>>>>>
>>>>> Your problem with the link to the Stata Journal file you were directed
>>>>> to me may be just that you didn't step past the standard material
>>>>> bundled with every reprint file.
>>>>>
>>>>> Nick
>>>>> [email protected]
>>>>>
>>>>>
>>>>> On 5 January 2014 21:03, Eric Booth <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>> <>
>>>>>>
>>>>>> The Stata Journal link you mention that Nick sent you works for me.
>>>>>> The
>>>>>> title of the article is "Stata tip 71: The problem of split identity,
>>>>>> or how
>>>>>> to group dyads” by Nick J. Cox, so maybe you can google that title if
>>>>>> your
>>>>>> browser isn’t navigating to it properly.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Your example dataset doesn’t align with your desired dataset.
>>>>>>
>>>>>> How do we know what is x and what is j in the first 20 obs of your
>>>>>> example data (see below) (also note the Statalist FAQ about not sending
>>>>>> attachments) ?
>>>>>>
>>>>>> You need some kind of identifier that ties, for example, obs or id 1
>>>>>> (even though it’s missing) to the other right side femur observation of
>>>>>> interest (is it id 7 or id 9 or ??).
>>>>>>
>>>>>>
>>>>>> **your example data:
>>>>>>
>>>>>> id type side length
>>>>>> 1 femur right
>>>>>> 2 femur left
>>>>>> 3 femur right
>>>>>> 4 femur left
>>>>>> 5 femur right 373
>>>>>> 6 femur left 416
>>>>>> 7 femur right 138
>>>>>> 8 femur left
>>>>>> 9 femur right 270
>>>>>> 10 femur left
>>>>>> 11 femur left
>>>>>> 12 femur right
>>>>>> 13 femur left
>>>>>> 14 femur right
>>>>>> 15 femur left 281
>>>>>> 16 femur right
>>>>>> 17 femur left 160
>>>>>> 18 femur left
>>>>>> 19 femur right
>>>>>> 20 femur left
>>>>>>
>>>>>>
>>>>>> We can’t just sort by ‘type’ and ‘side’ to get a dataset of the same
>>>>>> structure as you presented initially, so I think you need to provide
>>>>>> more
>>>>>> information about this. (also, if the rule is, as you imply, to sort
>>>>>> by
>>>>>> type and side and then subtract every third observation from each other
>>>>>> then
>>>>>> what do we do with missing 'length' and missing ‘side’?)
>>>>>>
>>>>>> If the rule is that id 1 and id 2 are a pair then whey does the
>>>>>> left/right ordering suddenly change starting around id 17?
>>>>>>
>>>>>> - Eric
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Jan 5, 2014, at 2:46 PM, Y.R.E. Retamal <[email protected]> wrote:
>>>>>>
>>>>>>> Dear Guys
>>>>>>>
>>>>>>> Some weeks ago, Red Owl and Nick helped me with some loops for my
>>>>>>> work.
>>>>>>> I have tried to run some suggestion in my dataset, but I had some
>>>>>>> difficulties.
>>>>>>> I give you the basic structure of my dataset and my question:
>>>>>>>
>>>>>>> I want to create some new variables containing the difference between
>>>>>>> the length of two individuals from different groups:
>>>>>>>
>>>>>>> id side length newvar1 newvar2 newvar3
>>>>>>> 1 right x x-j x-k x-l
>>>>>>> 2 right y y-j y-k y-l
>>>>>>> 3 right z z-j z-k z-l
>>>>>>> 4 left j j-x j-y j-z
>>>>>>> 5 left k k-x k-y k-z
>>>>>>> 6 left l l-x l-y l-z
>>>>>>>
>>>>>>> Red Owl suggested me following this example:
>>>>>>>
>>>>>>>>>> *** BEGIN CODE ***
>>>>>>>>>> * Build demo data set.
>>>>>>>>>> clear
>>>>>>>>>> * Length is capitalized to distinguish from length().
>>>>>>>>>> input id str5(side) Length
>>>>>>>>>> 1 right 10
>>>>>>>>>> 2 right 15
>>>>>>>>>> 3 right 11
>>>>>>>>>> 4 left 13
>>>>>>>>>> 5 left 10
>>>>>>>>>> 6 left 12
>>>>>>>>>> end
>>>>>>>>>> gen byte newvar1 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>> replace newvar1 = Length[`i'] - Length[4] in `i'
>>>>>>>>>> }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>> replace newvar1 = Length[`i'] - Length[1] in `i'
>>>>>>>>>> }
>>>>>>>>>> gen byte newvar2 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>> replace newvar2 = Length[`i'] - Length[5] in `i'
>>>>>>>>>> }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>> replace newvar2 = Length[`i'] - Length[2] in `i'
>>>>>>>>>> }
>>>>>>>>>> gen byte newvar3 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>> replace newvar3 = Length[`i'] - Length[6] in `i'
>>>>>>>>>> }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>> replace newvar3 = Length[`i'] - Length[3] in `i'
>>>>>>>>>> }
>>>>>>>>>> list, noobs sep(0)
>>>>>>>>>> *** END CODE ***
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> However, my dataset is much more longer and is difficult to perform
>>>>>>> it.
>>>>>>> I hope you can help me giving me more ideas.
>>>>>>> I send you an extract of my dataset in .xlsx format
>>>>>>> Also, the webpage suggested by Nick to review the discussion about the
>>>>>>> topic (http://www.stata-journal.com/sjpdf.html?articlenum=dm0043)
>>>>>>> redirects
>>>>>>> me to a non-sense file to download. Please give me the number of the
>>>>>>> journal
>>>>>>> to read the discussion.
>>>>>>>
>>>>>>> Happy new year to all of you
>>>>>>>
>>>>>>> Rodrigo
>>>>>>>
>>>>>>>
>>>>>>> On 2013-12-15 22:39, Y.R.E. Retamal wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Dear Red Owl and Nick
>>>>>>>> Thank you very much for your response. The code works perfectly, just
>>>>>>>> as I need.
>>>>>>>> Best wishes
>>>>>>>> Rodrigo
>>>>>>>> On 2013-12-14 22:31, Nick Cox wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In addition to Red's helpful suggestions, note that technique for
>>>>>>>>> such
>>>>>>>>> paired data was discussed in
>>>>>>>>> http://www.stata-journal.com/sjpdf.html?articlenum=dm0043
>>>>>>>>> which is publicly accessible. The problem is that the identifiers in
>>>>>>>>> Rodrigo's example appear to make little sense. How is Stata expected
>>>>>>>>> to know that 1 and 4, 2 and 5, 3 and 6 are paired? Perhaps the
>>>>>>>>> structure of the dataset is clearer in practice. If so, basic
>>>>>>>>> calculations are just a couple of lines or so.
>>>>>>>>> Nick
>>>>>>>>> [email protected]
>>>>>>>>> On 14 December 2013 15:33, Red Owl <[email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Rodrigo,
>>>>>>>>>> The following code demonstrates an approach with basic loops.
>>>>>>>>>> It could be made more efficient with a different loop
>>>>>>>>>> structure, but this approach may be more informative.
>>>>>>>>>> *** BEGIN CODE ***
>>>>>>>>>> * Build demo data set.
>>>>>>>>>> clear
>>>>>>>>>> * Length is capitalized to distinguish from length().
>>>>>>>>>> input id str5(side) Length
>>>>>>>>>> 1 right 10
>>>>>>>>>> 2 right 15
>>>>>>>>>> 3 right 11
>>>>>>>>>> 4 left 13
>>>>>>>>>> 5 left 10
>>>>>>>>>> 6 left 12
>>>>>>>>>> end
>>>>>>>>>> gen byte newvar1 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>> replace newvar1 = Length[`i'] - Length[4] in `i'
>>>>>>>>>> }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>> replace newvar1 = Length[`i'] - Length[1] in `i'
>>>>>>>>>> }
>>>>>>>>>> gen byte newvar2 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>> replace newvar2 = Length[`i'] - Length[5] in `i'
>>>>>>>>>> }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>> replace newvar2 = Length[`i'] - Length[2] in `i'
>>>>>>>>>> }
>>>>>>>>>> gen byte newvar3 = .
>>>>>>>>>> forval i = 1/3 {
>>>>>>>>>> replace newvar3 = Length[`i'] - Length[6] in `i'
>>>>>>>>>> }
>>>>>>>>>> forval i = 4/6 {
>>>>>>>>>> replace newvar3 = Length[`i'] - Length[3] in `i'
>>>>>>>>>> }
>>>>>>>>>> list, noobs sep(0)
>>>>>>>>>> *** END CODE ***
>>>>>>>>>> Good luck.
>>>>>>>>>> Red Owl
>>>>>>>>>> [email protected]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Y.R.E. Retamal" <[email protected]> Sat, 14 Dec 2013 12:08:42:
>>>>>>>>>>> Dear list
>>>>>>>>>>> I am very complicated trying to perform an analysis using STATA
>>>>>>>>>>> and
>>>>>>>>>>> I
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> cannot find the way. Maybe you could help me. I want to create some
>>>>>>>>>> new
>>>>>>>>>> variables containing the difference between the length of two
>>>>>>>>>> individuals from different groups:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> id side length newvar1 newvar2 newvar3
>>>>>>>>>>> 1 right x x-j x-k x-l
>>>>>>>>>>> 2 right y y-j y-k y-l
>>>>>>>>>>> 3 right z z-j z-k z-l
>>>>>>>>>>> 4 left j j-x j-y j-z
>>>>>>>>>>> 5 left k k-x k-y k-z
>>>>>>>>>>> 6 left l l-x l-y l-z
>>>>>>>>>>> I do not know if I do explain myself clearly, the individuals are
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> bones (clavicles, for example), so it is possible that some right
>>>>>>>>>> clavicles pair-match with left clavicles, following the idea that
>>>>>>>>>> an
>>>>>>>>>> individual has bone of similar length.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Any help could bring me a light!
>>>>>>>>>>> Best wishes
>>>>>>>>>>> Rodrigo
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *
>>>>>>>>>> * For searches and help try:
>>>>>>>>>> * http://www.stata.com/help.cgi?search
>>>>>>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *
>>>>>>>>> * For searches and help try:
>>>>>>>>> * http://www.stata.com/help.cgi?search
>>>>>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>>>>
>>>>>>>>
>>>>>>>> *
>>>>>>>> * For searches and help try:
>>>>>>>> * http://www.stata.com/help.cgi?search
>>>>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>>>
>>>>>>>
>>>>>>> <example.xlsx>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *
>>>>>> * For searches and help try:
>>>>>> * http://www.stata.com/help.cgi?search
>>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>>
>>>>>
>>>>> *
>>>>> * For searches and help try:
>>>>> * http://www.stata.com/help.cgi?search
>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>>
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/