Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Calculating Euclidean Distance
From
Anthony Laverty <[email protected]>
To
[email protected]
Subject
Re: st: Calculating Euclidean Distance
Date
Fri, 11 Jun 2010 09:43:29 +0100
Fair enough, i didnt really give too much more away. After the
matching i am planning on running a difference in difference analysis
to assess for the effect of policy changes on patient numbers, using
the matches as a comparison group. Mahalanobis distance may in fact be
an improvement, so i will look that up
Many thanks
On Thu, Jun 10, 2010 at 4:50 PM, Austin Nichols <[email protected]> wrote:
> Anthony Laverty <[email protected]> :
> You didn't give more detail on your problem--what are you going to use
> the matches for? Why use the sum of squared differences in each
> month, as opposed to, say the Mahalanobis distance over all months
> (-reshape- to have T variables measuring # of patients in each month,
> and find the closest 15 obs in the standard deviation metric)? That
> would match not only on levels but on seasonal patterns, for example.
> Is there a regression you plan to run after matching? You may want to
> -findit nnmatch- in that case.
>
> On Thu, Jun 10, 2010 at 11:30 AM, Anthony Laverty
> <[email protected]> wrote:
>> Hi Austin
>>
>> That's helpful, thanks, and good points about my memory considerations
>> and perhaps using a log scale
>>
>> Unfortunately, what i really want to be able to do is choose a group
>> of hospitals (say 15) which are closest in Euclidean distance terms to
>> hospital A over all months, rather than just the one closest hospital.
>> I was planning to aggregate these for the whole of the time period at
>> the end, if that makes things any easier.
>>
>> In terms of more detail i'm not sure if it helps to say that this was
>> relatively easy to work out in excel, using a different column for
>> each time period; a row for each hospital and the number of patients
>> for each time period in a table like this. Then, it was quite easy to
>> work out the distances with the equation subtracting different
>> hospitals' numbers from each other, using if statements to match on
>> time. The new data i have is too big for Excel to do this, which is
>> why i have turned to stata (and statalist)
>>
>> Thanks for your consideration
>>
>> Anthony
>>
>>
>> On Thu, Jun 10, 2010 at 2:59 PM, Austin Nichols <[email protected]> wrote:
>>> Anthony Laverty <[email protected]> :
>>> If you have N hospitals at T points in time, then you will have NTxN
>>> squared distances in your variables, and if they are doubles you may
>>> well run out of memory long before that, but if all you want is the
>>> nearest hospital, then you want one variable per hospital giving the
>>> identity of the nearest (over all months, you seem to suggest). You
>>> might also want to compute distance on a log scale, or some other
>>> metric. With more detail on your problem, you may get a better answer.
>>> Nevertheless, this is like what you asked for, I think:
>>>
>>> clear
>>> input str1 hospital time patients
>>> A 1 456
>>> A 2 759
>>> A 3 236
>>> B 1 214
>>> B 2 854
>>> B 3 325
>>> C 1 250
>>> C 2 321
>>> C 3 852
>>> end
>>> egen g=group(hospital)
>>> su g, mean
>>> loc N=r(max)
>>> forv i=1/`N' {
>>> g double d`i'=.
>>> }
>>> levelsof time, loc(ts)
>>> fillin time g
>>> sort time g
>>> g long obs=_n
>>> qui foreach t of loc ts {
>>> su obs if time==`t', mean
>>> loc n0=r(min)
>>> loc n1=r(max)
>>> forv i=`n0'/`n1' {
>>> loc n=`i'-`n0'+1
>>> replace d`n'=(patients-patients[`i'])^2 if inrange(_n,`n0',`n1')
>>> }
>>> }
>>> l, sepby(time) noo
>>>
>>> On Thu, Jun 10, 2010 at 5:08 AM, Anthony Laverty
>>> <[email protected]> wrote:
>>>> Dear Statalist
>>>>
>>>>
>>>>
>>>> I have data on patient numbers at various hospitals and am trying to
>>>> calculate a new variable which is the Euclidean distance between one
>>>> specific hospital (say A) and all of the others, so that i can select
>>>> which hospitals had the most similar number of patients across all
>>>> months. The data is more or less arranged like this (although it has
>>>> a few more columns not of direct interest to this question):
>>>>
>>>> Hospital Time Patients
>>>> A 1 456
>>>> A 2 759
>>>> A 3 236
>>>> B 1 214
>>>> B 2 854
>>>> B 3 325
>>>> C 1 250
>>>> C 2 321
>>>> C 3 852
>>>>
>>>>
>>>>
>>>> So, i want to cycle through each time period and calculate the
>>>> difference squared between hospital A and all of the other hospitals
>>>> individually as one new variable.
>>>>
>>>>
>>>>
>>>> Any suggestions greatly appreciated
>>>>
>>>>
>>>>
>>>> Anthony Laverty
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/