Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Identify 5 closest observations of a variable and then calculate average of another variable based on the observations identified
From
Austin Nichols <[email protected]>
To
[email protected]
Subject
Re: st: Identify 5 closest observations of a variable and then calculate average of another variable based on the observations identified
Date
Tue, 18 Sep 2012 10:47:50 -0400
Joseph Monte <[email protected]>:
The best way to approach this depends on the data size and structure.
If you have easy data like below, you can -cross- and compute
directly; for a large dataset, you may want to loop over observations
(cf. e.g. http://www.stata.com/statalist/archive/2007-10/msg00346.html).
To loop over observations and sort repeatedly by distance based on one
or more variables, it will behoove you to create a numeric id
corresponding to the obs number at the outset, so you can re-sort when
you are done with each iteration of the loop, which will make it easy
to refer to a specific observation. Something like:
clear all
input str1 reg v1 v2
A 3.29515 47
A 5.39742 38
A 7.94641 43
A 11.25495 235
A 22.35908 61
A 27.19206 76
A 41.03306 66
A 45.56846 89
A 53.63861 116
A 73.2925 76
A 104.3025 63
A 229.7772 74
A 634.0973 61
A 1053.78 80
A 1163.681 47
B 2.339128 55
B 2.378151 46
B 9.831361 47
B 15.83442 57
B 16.48956 42
B 28.70144 44
B 56.01777 29
B 113.9736 103
B 178.731 47
B 340.715 103
C 0.5892565 44
C 2.016974 37
C 3.041719 76
C 4.009228 80
C 5.856674 51
C 7.587287 188
C 8.827202 66
C 11.53763 48
C 11.67932 152
C 11.86612 51
C 12.95344 84
C 14.85097 63
C 17.12918 47
C 17.74263 67
C 17.97567 75
C 20.60005 84
C 22.13938 44
C 28.99966 44
C 31.23538 55
C 31.52542 36
end
g long id=_n
g double m=.
forv i=1/`=_N' {
sort id
g d=(v1-v1[`i'])^2
g noti=_n==`i'
loc mr=reg[`i']
bys noti reg (d): g f5=(_n<6) if reg=="`mr'"¬i==0
qui count if f5==1
if r(N)==5 {
su v2 if f5==1, mean
replace m=r(mean) if id==`i'
}
drop d noti f5
}
sort id
list, noo
On Mon, Sep 17, 2012 at 12:34 PM, Joseph Monte <[email protected]> wrote:
> Dear Statalisters,
>
> The data below shows three variables:- region, var1 and var2. For each
> observation in a given region, I want the 5 closest observations based
> on var1 (not counting the observation in question). I basically need
> the average value of var2 for the 5 observations that are identified.
> I don't have any missing values in my data for all three variables
> below. I can also confirm that I have a few regions with less than 6
> observations each; hence these regions will be ignored. I am using
> Stata 12.
>
> Thanks,
>
> Joe
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/