Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: nearest neighbor distance


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: nearest neighbor distance
Date   Thu, 9 Jan 2003 13:42:07 -0000

Benoit Dulong
> >
> > x=(x1,x2), a point in R^2.
> >
> > My dataset (simulation), has at least 200 points.
> > For each point in the dataset, I want:
> > 1- identify the nearest neighbor (nnid)
> > 2- calculate the distance to that nearest neighbor (nnd)
> > How can I create nnid and nnd ?
> >
> >  list in 1/10, noobs
> >
> >        id         x1         x2       nnid        nnd
> >         1     0.6231     0.6594          .          .
> >         2     0.0770     0.8497          .          .
> >         3     0.8031     0.5251          .          .
> >         4     0.4283     0.2249          .          .
> >         5     0.2084     0.1750          .          .
> >         6     0.8936     0.9179          .          .
> >         7     0.6168     0.7379          .          .
> >         8     0.5663     0.2539          .          .
> >         9     0.6465     0.5444          .          .
> >        10     0.7783     0.0047          .          .
> >

Stephen Jenkins

> How about something like the following:
>
> 1. Put x1 and x2 in separate data sets A1, A2, each with n rows,
> 	including in each a row-specific unique identifier
> 2. create a long format data set of n x n rows which contains all
> 	possible (x1,x2) pairs,
> 3. using your favourite distance metric formula, d(.), calculate
> 	d(x1_i,x2_i) for i = 1,...,nxn, and from this you will also
> 	get your nearest neighbour = obs with min d(), by i.
> 	[E.g use some -egen- function, with a -by- id option].
> 4. Tag the relevant nearest neighbour observations, and
> save them in a
> 	file together with row number id
> 5. Merge back on to original data set, using i as key.

You can also do it in place without any need for fiddling around
with files. This would probably get
a D in any computer science course, but it
should be practical enough for the sample sizes implied.

Euclid -- or perhaps Pythagoras -- wired in. One line
to change if you want some other definition of distance.

program def nearest
*! NJC 1.0.0 9 January 2003
	version 7
	syntax varlist(min=2 max=2 numeric) [if] [in] , id(string)
dist(string)
	confirm new var `id'
	confirm new var `dist'
	marksample touse
	tokenize `varlist'
	args x y

	qui {
		gen `id' = .
		gen `dist' = .
		tempname d
		local n = _N
		forval i = 1/`n' {
			forval j = 1/`n' {
				if `touse'[`i'] & (`i' != `j') {
					scalar `d' =	 /*
			*/ (`x'[`i'] - `x'[`j'])^2 + (`y'[`i'] - `y'[`j'])^2
					if `d' < `dist'[`i'] {
						replace `dist' = `d' in `i'
						replace `id' = `j' in `i'
					}
				}
			}
		}
		replace `dist' = sqrt(`dist')
	}
end

Nick
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2025 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index