Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Calculating Euclidean Distance
From
Austin Nichols <[email protected]>
To
[email protected]
Subject
Re: st: Calculating Euclidean Distance
Date
Thu, 10 Jun 2010 11:50:44 -0400
Anthony Laverty <[email protected]> :
You didn't give more detail on your problem--what are you going to use
the matches for? Why use the sum of squared differences in each
month, as opposed to, say the Mahalanobis distance over all months
(-reshape- to have T variables measuring # of patients in each month,
and find the closest 15 obs in the standard deviation metric)? That
would match not only on levels but on seasonal patterns, for example.
Is there a regression you plan to run after matching? You may want to
-findit nnmatch- in that case.
On Thu, Jun 10, 2010 at 11:30 AM, Anthony Laverty
<[email protected]> wrote:
> Hi Austin
>
> That's helpful, thanks, and good points about my memory considerations
> and perhaps using a log scale
>
> Unfortunately, what i really want to be able to do is choose a group
> of hospitals (say 15) which are closest in Euclidean distance terms to
> hospital A over all months, rather than just the one closest hospital.
> I was planning to aggregate these for the whole of the time period at
> the end, if that makes things any easier.
>
> In terms of more detail i'm not sure if it helps to say that this was
> relatively easy to work out in excel, using a different column for
> each time period; a row for each hospital and the number of patients
> for each time period in a table like this. Then, it was quite easy to
> work out the distances with the equation subtracting different
> hospitals' numbers from each other, using if statements to match on
> time. The new data i have is too big for Excel to do this, which is
> why i have turned to stata (and statalist)
>
> Thanks for your consideration
>
> Anthony
>
>
> On Thu, Jun 10, 2010 at 2:59 PM, Austin Nichols <[email protected]> wrote:
>> Anthony Laverty <[email protected]> :
>> If you have N hospitals at T points in time, then you will have NTxN
>> squared distances in your variables, and if they are doubles you may
>> well run out of memory long before that, but if all you want is the
>> nearest hospital, then you want one variable per hospital giving the
>> identity of the nearest (over all months, you seem to suggest). You
>> might also want to compute distance on a log scale, or some other
>> metric. With more detail on your problem, you may get a better answer.
>> Nevertheless, this is like what you asked for, I think:
>>
>> clear
>> input str1 hospital time patients
>> A 1 456
>> A 2 759
>> A 3 236
>> B 1 214
>> B 2 854
>> B 3 325
>> C 1 250
>> C 2 321
>> C 3 852
>> end
>> egen g=group(hospital)
>> su g, mean
>> loc N=r(max)
>> forv i=1/`N' {
>> g double d`i'=.
>> }
>> levelsof time, loc(ts)
>> fillin time g
>> sort time g
>> g long obs=_n
>> qui foreach t of loc ts {
>> su obs if time==`t', mean
>> loc n0=r(min)
>> loc n1=r(max)
>> forv i=`n0'/`n1' {
>> loc n=`i'-`n0'+1
>> replace d`n'=(patients-patients[`i'])^2 if inrange(_n,`n0',`n1')
>> }
>> }
>> l, sepby(time) noo
>>
>> On Thu, Jun 10, 2010 at 5:08 AM, Anthony Laverty
>> <[email protected]> wrote:
>>> Dear Statalist
>>>
>>>
>>>
>>> I have data on patient numbers at various hospitals and am trying to
>>> calculate a new variable which is the Euclidean distance between one
>>> specific hospital (say A) and all of the others, so that i can select
>>> which hospitals had the most similar number of patients across all
>>> months. The data is more or less arranged like this (although it has
>>> a few more columns not of direct interest to this question):
>>>
>>> Hospital Time Patients
>>> A 1 456
>>> A 2 759
>>> A 3 236
>>> B 1 214
>>> B 2 854
>>> B 3 325
>>> C 1 250
>>> C 2 321
>>> C 3 852
>>>
>>>
>>>
>>> So, i want to cycle through each time period and calculate the
>>> difference squared between hospital A and all of the other hospitals
>>> individually as one new variable.
>>>
>>>
>>>
>>> Any suggestions greatly appreciated
>>>
>>>
>>>
>>> Anthony Laverty
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/