Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Nearest neighbor distance


From   "Lange, Sandra" <[email protected]>
To   "[email protected]" <[email protected]>
Subject   st: Nearest neighbor distance
Date   Tue, 23 Aug 2011 17:27:03 +0000

I would like to modify the code of the stata command 'nearest' to identify the closest neighbor (from a defined set of observations) for specific observations in a panel data set.
I work with an unbalanced sample of firms which ranges over a time period of about 20 years. 
The dataset contains the portfolio of subsidiaries of each firm in each year and consists of over 100,000 observations (one observation = subsidiary of a firm in one year). In addition, several country characteristics were merged into the dataset. Below you find an excerpt to get an impression of how the data looks like:
firm_id	unit_id	year	status	country countryname	pdi	idv	mas	uai	subyears	nearest	nearest_id  
100	15	1990	U	215	Japan		54	46	95	92          2	
100	44	1990	I	235	Russia		93	39	36	95          0
100	4	1990	U	404	Belgium	65	75	54	94	3	
100	46	1990	I	408	Germany	35	67	66	65	0	
100	18	1990	U	408	Germany	35	67	66	65	4	
100	2	1990	U	408	Germany	35	67	66	65	4	
100	38	1990	I	434	Switzerland	34	68	70	58	0	
100	15	1991	U	215	Japan		54	46	95	92	3	
100	44	1991	U	235	Russia		93	39	36	95	1	
100	4	1991	U	404	Belgium	65	75	54	94	4	
100	46	1991	U	408	Germany	35	67	66	65	7	
100	18	1991	U	408	Germany	35	67	66	65	7	
100	2	1991	U	408	Germany	35	67	66	65	7	
100	38	1991	U	434	Switzerland	34	68	70	58	1	
100	54	1991	I	429	Poland		68	60	64	93	0			
100	53	1991	I	429	Poland		68	60	64	93	0			
100	51	1991	I	430	Portugal	63	27	31	104	0				
.	.	.	.	.	.	.	.	.	.	.	...
101	181	1985	U	215	Japan		54	46	95	92	1
101	150	1985	U	236	Saudi-Arabia	80	38	52	68	1	
101	146	1985	U	237	Singapur	74	20	48	8	1	
101	140	1985	U	404	Belgium	65	75	54	94	2	
101	155	1985	U	408	Germany	35	67	66	65	3	
101	83	1985	U	408	Germany	35	67	66	65	3	
101	84	1985	U	408	Germany	35	67	66	65	3	
101	133	1985	U	411	France		68	71	43	86	2
101	147	1985	U	411	France		68	71	43	86	2
101	222	1985	I	438	Spain		34	51	42	86	0
.	.	.	.	.	.	.	.	.	.	.	
		
More precisely, this is what I would like to do:

1.  for each observation with status 'I' (Investment), I am looking for the closest country in terms of cultural dimensions (pdi, idv, mas, uai) in the firms   existing portfolio (observations with status 'U'). I suppose I could use the code for 'nearest'; however, I probably would have to change it slightly, because the 'nearest' command finds the closest neighbor in N; however, I am looking for the closest neighbor in _n which should be somehow specified as the existing portfolio (all subsidiary-year observations with status ==U). 
- Is it possible to modify the code of the command 'nearest' for that in the first place? Does someone have a suggestion?
- How should I deal with the fact that I have multiple dimensions in the code of the command 'nearest'? I want to use the Kogut&Singh index for calculating   
  the distance based on these four dimensions. At some point I would have to indicate that, but I do not know where.  

2. A slight modification of 1.: for each observation with status 'I' (Investment), I am looking for the closest country (in the firms existing portfolio)  in terms of cultural dimensions (pdi, idv, mas, uai) AND subyears. If subyears < 5, then the country should not qualify for being selected as the closest neighbor. In this case the second closest neighbor should be chosen and checked if subyears >= 5. Otherwise the third closest neighbor should be investigated, and so on. 

I appreciate your input!
Thanks,
Sandra

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index