Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: pairing unpaired data [was: Re: st: any idea?]
From
"Brent McSharry (ADHB)" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: pairing unpaired data [was: Re: st: any idea?]
Date
Wed, 8 Jan 2014 09:22:21 +1300
A program thrown together to do what I believe you want (although not automatically):
program closestvalues, byable(recall) sortpreserve
version 10.1
syntax varname [if] [in], Matchon(varlist min=1 max=1) Id(varlist min=1 max=1)
marksample touse
tempvar absdif
qui gen `absdif' = .
qui count if `touse'
local obs `r(N)'
if (`r(N)'>0) {
forvalues i=1(1)`r(N)' {
gsort -`touse' `id'
qui replace `absdif' = cond(_n!=`i' & `matchon'!=`matchon'[`i'], abs(`varlist'[`i']-`varlist'),.) if `touse'
di as res "closest variables to `=`id'[`i']'(`=`varlist'[`i']')"
gsort -`touse' `absdif'
li `id' `varlist' `absdif' in 1/2 if `touse', noo
}
}
end
save this in the ado/personal folder and then type
bysort bone:closestvalues length, id(id) match(side)
you will then get output for each bone like
-> bone = femur
closest variables to 1(18)
+------------------------+
| id length __00000A |
|------------------------|
| 11 28 10 |
| 12 80 62 |
+------------------------+
it will list the closest2 matches for each bone. You will then have to make a table of which matches are acceptable to you (or modify the program to automatically assign a match when prespecified criteria are met eg a single record within 1%).
This program is ugly/slow, but will hopefully speed up what you are trying to do.
Brent McSharry MBBS BSc(med) FCICM(paed)
Paediatric Intensivist
Starship Children's Hospital
Private Bag 92024
Auckland 1142
New Zealand
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
Sent: Wednesday, 8 January 2014 8:28 a.m.
To: [email protected]
Subject: Re: pairing unpaired data [was: Re: st: any idea?]
Thanks for the details of your problem. I can't see that you have a
method that is translatable into Stata code: your procedure is too
vaguely specified. That need not stop other people suggesting methods.
Nick
[email protected]
On 7 January 2014 19:20, Y.R.E. Retamal <[email protected]> wrote:
> Dear Nick
>
> Thanks a lot for your soon response. The method is no more than showed. I
> have to add other variables like width and height for the same bone. So, if
> three variables match, probably both bones would be from the same skeleton.
> I would expect that many bones would not match between them, so I could
> discard them being from the same skeleton. Problems would appear if e.g. a
> right bone matches with more than one left bone. But at least I could
> simplify the work and after I could focus on problematic cases.
>
> Rodrigo
>
>
>
>
>
>
>
> On 2014-01-07 18:49, Nick Cox wrote:
>>
>> I changed the thread title, which was not informative.
>>
>> You need a method. Some predictable pitfalls are that for some bones
>> there is no acceptable match and that others there could be two or
>> more acceptable matches. I don't think there is a canned solution
>> independent of your spelling out what the method is.
>>
>> Nick
>> [email protected]
>>
>>
>> On 7 January 2014 18:20, Y.R.E. Retamal <[email protected]> wrote:
>>>
>>> Thank you very much Eric and Nick for the advices.
>>>
>>> I will try to give a clearer idea of what want to do:
>>> For example I have the following database of human bones. I removed
>>> missing
>>> values of length for a better understanding:
>>>
>>> id type side length id type side length
>>> 1 femur left 18 21 humerus left 13
>>> 2 femur left 65.85 22 humerus left 56
>>> 3 femur left 69.1 23 humerus left 92
>>> 4 femur left 130 24 humerus left 126
>>> 5 femur left 131.2 25 humerus left 154
>>> 6 femur left 143 26 humerus left 170
>>> 7 femur left 145 27 humerus left 198
>>> 8 femur left 160 28 humerus left 228
>>> 9 femur left 183 29 humerus left 230
>>> 10 femur left 200 30 humerus left 232
>>> 11 femur right 28 31 humerus right 238
>>> 12 femur right 80 32 humerus right 10
>>> 13 femur right 96.5 33 humerus right 66
>>> 14 femur right 126 34 humerus right 123
>>> 15 femur right 127 35 humerus right 128
>>> 16 femur right 128 36 humerus right 143
>>> 17 femur right 138 37 humerus right 200
>>> 18 femur right 146 38 humerus right 228
>>> 19 femur right 148 39 humerus right 230
>>> 20 femur right 200 40 humerus right 241
>>>
>>> These data belong to a commingled skeletal collection and some right
>>> bones
>>> (femurs and humerus respectively) should match with a left bone, but I do
>>> not know which bones match. Following the idea that a right bone from a
>>> same
>>> skeleton should have the same length (approximately) with its respective
>>> left bone, I want to subtract each right femur to each left femur, with
>>> the
>>> aim to find which right femur matches with a left femur, i.e. have the
>>> same
>>> or almost the same length, so the subtraction would be zero or near zero.
>>> The same proceeding with the humerus (and other bones).
>>>
>>> If you have any idea to perform this, please let me know.
>>>
>>> Rodrigo
>>>
>>>
>>>
>>> Best wishes
>>>
>>> Rodrigo
>>>
>>>
>>>
>>>
>>>
>>> On 2014-01-05 23:54, Nick Cox wrote:
>>>>
>>>>
>>>> <>
>>>>
>>>> Eric Booth gives very good advice.
>>>>
>>>> Your problem with the link to the Stata Journal file you were directed
>>>> to me may be just that you didn't step past the standard material
>>>> bundled with every reprint file.
>>>>
>>>> Nick
>>>> [email protected]
>>>>
>>>>
>>>> On 5 January 2014 21:03, Eric Booth <[email protected]> wrote:
>>>>>
>>>>>
>>>>> <>
>>>>>
>>>>> The Stata Journal link you mention that Nick sent you works for me.
>>>>> The
>>>>> title of the article is "Stata tip 71: The problem of split identity,
>>>>> or how
>>>>> to group dyads" by Nick J. Cox, so maybe you can google that title if
>>>>> your
>>>>> browser isn't navigating to it properly.
>>>>>
>>>>>
>>>>>
>>>>> Your example dataset doesn't align with your desired dataset.
>>>>>
>>>>> How do we know what is x and what is j in the first 20 obs of your
>>>>> example data (see below) (also note the Statalist FAQ about not sending
>>>>> attachments) ?
>>>>>
>>>>> You need some kind of identifier that ties, for example, obs or id 1
>>>>> (even though it's missing) to the other right side femur observation of
>>>>> interest (is it id 7 or id 9 or ??).
>>>>>
>>>>>
>>>>> **your example data:
>>>>>
>>>>> id type side length
>>>>> 1 femur right
>>>>> 2 femur left
>>>>> 3 femur right
>>>>> 4 femur left
>>>>> 5 femur right 373
>>>>> 6 femur left 416
>>>>> 7 femur right 138
>>>>> 8 femur left
>>>>> 9 femur right 270
>>>>> 10 femur left
>>>>> 11 femur left
>>>>> 12 femur right
>>>>> 13 femur left
>>>>> 14 femur right
>>>>> 15 femur left 281
>>>>> 16 femur right
>>>>> 17 femur left 160
>>>>> 18 femur left
>>>>> 19 femur right
>>>>> 20 femur left
>>>>>
>>>>>
>>>>> We can't just sort by 'type' and 'side' to get a dataset of the same
>>>>> structure as you presented initially, so I think you need to provide
>>>>> more
>>>>> information about this. (also, if the rule is, as you imply, to sort
>>>>> by
>>>>> type and side and then subtract every third observation from each other
>>>>> then
>>>>> what do we do with missing 'length' and missing 'side'?)
>>>>>
>>>>> If the rule is that id 1 and id 2 are a pair then whey does the
>>>>> left/right ordering suddenly change starting around id 17?
>>>>>
>>>>> - Eric
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Jan 5, 2014, at 2:46 PM, Y.R.E. Retamal <[email protected]> wrote:
>>>>>
>>>>>> Dear Guys
>>>>>>
>>>>>> Some weeks ago, Red Owl and Nick helped me with some loops for my
>>>>>> work.
>>>>>> I have tried to run some suggestion in my dataset, but I had some
>>>>>> difficulties.
>>>>>> I give you the basic structure of my dataset and my question:
>>>>>>
>>>>>> I want to create some new variables containing the difference between
>>>>>> the length of two individuals from different groups:
>>>>>>
>>>>>> id side length newvar1 newvar2 newvar3
>>>>>> 1 right x x-j x-k x-l
>>>>>> 2 right y y-j y-k y-l
>>>>>> 3 right z z-j z-k z-l
>>>>>> 4 left j j-x j-y j-z
>>>>>> 5 left k k-x k-y k-z
>>>>>> 6 left l l-x l-y l-z
>>>>>>
>>>>>> Red Owl suggested me following this example:
>>>>>>
>>>>>>>>> *** BEGIN CODE ***
>>>>>>>>> * Build demo data set.
>>>>>>>>> clear
>>>>>>>>> * Length is capitalized to distinguish from length().
>>>>>>>>> input id str5(side) Length
>>>>>>>>> 1 right 10
>>>>>>>>> 2 right 15
>>>>>>>>> 3 right 11
>>>>>>>>> 4 left 13
>>>>>>>>> 5 left 10
>>>>>>>>> 6 left 12
>>>>>>>>> end
>>>>>>>>> gen byte newvar1 = .
>>>>>>>>> forval i = 1/3 {
>>>>>>>>> replace newvar1 = Length[`i'] - Length[4] in `i'
>>>>>>>>> }
>>>>>>>>> forval i = 4/6 {
>>>>>>>>> replace newvar1 = Length[`i'] - Length[1] in `i'
>>>>>>>>> }
>>>>>>>>> gen byte newvar2 = .
>>>>>>>>> forval i = 1/3 {
>>>>>>>>> replace newvar2 = Length[`i'] - Length[5] in `i'
>>>>>>>>> }
>>>>>>>>> forval i = 4/6 {
>>>>>>>>> replace newvar2 = Length[`i'] - Length[2] in `i'
>>>>>>>>> }
>>>>>>>>> gen byte newvar3 = .
>>>>>>>>> forval i = 1/3 {
>>>>>>>>> replace newvar3 = Length[`i'] - Length[6] in `i'
>>>>>>>>> }
>>>>>>>>> forval i = 4/6 {
>>>>>>>>> replace newvar3 = Length[`i'] - Length[3] in `i'
>>>>>>>>> }
>>>>>>>>> list, noobs sep(0)
>>>>>>>>> *** END CODE ***
>>>>>>
>>>>>>
>>>>>>
>>>>>> However, my dataset is much more longer and is difficult to perform
>>>>>> it.
>>>>>> I hope you can help me giving me more ideas.
>>>>>> I send you an extract of my dataset in .xlsx format
>>>>>> Also, the webpage suggested by Nick to review the discussion about the
>>>>>> topic (http://www.stata-journal.com/sjpdf.html?articlenum=dm0043)
>>>>>> redirects
>>>>>> me to a non-sense file to download. Please give me the number of the
>>>>>> journal
>>>>>> to read the discussion.
>>>>>>
>>>>>> Happy new year to all of you
>>>>>>
>>>>>> Rodrigo
>>>>>>
>>>>>>
>>>>>> On 2013-12-15 22:39, Y.R.E. Retamal wrote:
>>>>>>>
>>>>>>>
>>>>>>> Dear Red Owl and Nick
>>>>>>> Thank you very much for your response. The code works perfectly, just
>>>>>>> as I need.
>>>>>>> Best wishes
>>>>>>> Rodrigo
>>>>>>> On 2013-12-14 22:31, Nick Cox wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> In addition to Red's helpful suggestions, note that technique for
>>>>>>>> such
>>>>>>>> paired data was discussed in
>>>>>>>> http://www.stata-journal.com/sjpdf.html?articlenum=dm0043
>>>>>>>> which is publicly accessible. The problem is that the identifiers in
>>>>>>>> Rodrigo's example appear to make little sense. How is Stata expected
>>>>>>>> to know that 1 and 4, 2 and 5, 3 and 6 are paired? Perhaps the
>>>>>>>> structure of the dataset is clearer in practice. If so, basic
>>>>>>>> calculations are just a couple of lines or so.
>>>>>>>> Nick
>>>>>>>> [email protected]
>>>>>>>> On 14 December 2013 15:33, Red Owl <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Rodrigo,
>>>>>>>>> The following code demonstrates an approach with basic loops.
>>>>>>>>> It could be made more efficient with a different loop
>>>>>>>>> structure, but this approach may be more informative.
>>>>>>>>> *** BEGIN CODE ***
>>>>>>>>> * Build demo data set.
>>>>>>>>> clear
>>>>>>>>> * Length is capitalized to distinguish from length().
>>>>>>>>> input id str5(side) Length
>>>>>>>>> 1 right 10
>>>>>>>>> 2 right 15
>>>>>>>>> 3 right 11
>>>>>>>>> 4 left 13
>>>>>>>>> 5 left 10
>>>>>>>>> 6 left 12
>>>>>>>>> end
>>>>>>>>> gen byte newvar1 = .
>>>>>>>>> forval i = 1/3 {
>>>>>>>>> replace newvar1 = Length[`i'] - Length[4] in `i'
>>>>>>>>> }
>>>>>>>>> forval i = 4/6 {
>>>>>>>>> replace newvar1 = Length[`i'] - Length[1] in `i'
>>>>>>>>> }
>>>>>>>>> gen byte newvar2 = .
>>>>>>>>> forval i = 1/3 {
>>>>>>>>> replace newvar2 = Length[`i'] - Length[5] in `i'
>>>>>>>>> }
>>>>>>>>> forval i = 4/6 {
>>>>>>>>> replace newvar2 = Length[`i'] - Length[2] in `i'
>>>>>>>>> }
>>>>>>>>> gen byte newvar3 = .
>>>>>>>>> forval i = 1/3 {
>>>>>>>>> replace newvar3 = Length[`i'] - Length[6] in `i'
>>>>>>>>> }
>>>>>>>>> forval i = 4/6 {
>>>>>>>>> replace newvar3 = Length[`i'] - Length[3] in `i'
>>>>>>>>> }
>>>>>>>>> list, noobs sep(0)
>>>>>>>>> *** END CODE ***
>>>>>>>>> Good luck.
>>>>>>>>> Red Owl
>>>>>>>>> [email protected]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Y.R.E. Retamal" <[email protected]> Sat, 14 Dec 2013 12:08:42:
>>>>>>>>>> Dear list
>>>>>>>>>> I am very complicated trying to perform an analysis using STATA
>>>>>>>>>> and
>>>>>>>>>> I
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> cannot find the way. Maybe you could help me. I want to create some
>>>>>>>>> new
>>>>>>>>> variables containing the difference between the length of two
>>>>>>>>> individuals from different groups:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> id side length newvar1 newvar2 newvar3
>>>>>>>>>> 1 right x x-j x-k x-l
>>>>>>>>>> 2 right y y-j y-k y-l
>>>>>>>>>> 3 right z z-j z-k z-l
>>>>>>>>>> 4 left j j-x j-y j-z
>>>>>>>>>> 5 left k k-x k-y k-z
>>>>>>>>>> 6 left l l-x l-y l-z
>>>>>>>>>> I do not know if I do explain myself clearly, the individuals are
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> bones (clavicles, for example), so it is possible that some right
>>>>>>>>> clavicles pair-match with left clavicles, following the idea that
>>>>>>>>> an
>>>>>>>>> individual has bone of similar length.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Any help could bring me a light!
>>>>>>>>>> Best wishes
>>>>>>>>>> Rodrigo
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *
>>>>>>>>> * For searches and help try:
>>>>>>>>> * http://www.stata.com/help.cgi?search
>>>>>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>>>>
>>>>>>>>
>>>>>>>> *
>>>>>>>> * For searches and help try:
>>>>>>>> * http://www.stata.com/help.cgi?search
>>>>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>>>
>>>>>>>
>>>>>>> *
>>>>>>> * For searches and help try:
>>>>>>> * http://www.stata.com/help.cgi?search
>>>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>>
>>>>>>
>>>>>> <example.xlsx>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *
>>>>> * For searches and help try:
>>>>> * http://www.stata.com/help.cgi?search
>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>>
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/