Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Nearest neighbor distance
From
"Lange, Sandra" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: Nearest neighbor distance
Date
Tue, 23 Aug 2011 17:27:03 +0000
I would like to modify the code of the stata command 'nearest' to identify the closest neighbor (from a defined set of observations) for specific observations in a panel data set.
I work with an unbalanced sample of firms which ranges over a time period of about 20 years.
The dataset contains the portfolio of subsidiaries of each firm in each year and consists of over 100,000 observations (one observation = subsidiary of a firm in one year). In addition, several country characteristics were merged into the dataset. Below you find an excerpt to get an impression of how the data looks like:
firm_id unit_id year status country countryname pdi idv mas uai subyears nearest nearest_id
100 15 1990 U 215 Japan 54 46 95 92 2
100 44 1990 I 235 Russia 93 39 36 95 0
100 4 1990 U 404 Belgium 65 75 54 94 3
100 46 1990 I 408 Germany 35 67 66 65 0
100 18 1990 U 408 Germany 35 67 66 65 4
100 2 1990 U 408 Germany 35 67 66 65 4
100 38 1990 I 434 Switzerland 34 68 70 58 0
100 15 1991 U 215 Japan 54 46 95 92 3
100 44 1991 U 235 Russia 93 39 36 95 1
100 4 1991 U 404 Belgium 65 75 54 94 4
100 46 1991 U 408 Germany 35 67 66 65 7
100 18 1991 U 408 Germany 35 67 66 65 7
100 2 1991 U 408 Germany 35 67 66 65 7
100 38 1991 U 434 Switzerland 34 68 70 58 1
100 54 1991 I 429 Poland 68 60 64 93 0
100 53 1991 I 429 Poland 68 60 64 93 0
100 51 1991 I 430 Portugal 63 27 31 104 0
. . . . . . . . . . . ...
101 181 1985 U 215 Japan 54 46 95 92 1
101 150 1985 U 236 Saudi-Arabia 80 38 52 68 1
101 146 1985 U 237 Singapur 74 20 48 8 1
101 140 1985 U 404 Belgium 65 75 54 94 2
101 155 1985 U 408 Germany 35 67 66 65 3
101 83 1985 U 408 Germany 35 67 66 65 3
101 84 1985 U 408 Germany 35 67 66 65 3
101 133 1985 U 411 France 68 71 43 86 2
101 147 1985 U 411 France 68 71 43 86 2
101 222 1985 I 438 Spain 34 51 42 86 0
. . . . . . . . . . .
More precisely, this is what I would like to do:
1. for each observation with status 'I' (Investment), I am looking for the closest country in terms of cultural dimensions (pdi, idv, mas, uai) in the firms existing portfolio (observations with status 'U'). I suppose I could use the code for 'nearest'; however, I probably would have to change it slightly, because the 'nearest' command finds the closest neighbor in N; however, I am looking for the closest neighbor in _n which should be somehow specified as the existing portfolio (all subsidiary-year observations with status ==U).
- Is it possible to modify the code of the command 'nearest' for that in the first place? Does someone have a suggestion?
- How should I deal with the fact that I have multiple dimensions in the code of the command 'nearest'? I want to use the Kogut&Singh index for calculating
the distance based on these four dimensions. At some point I would have to indicate that, but I do not know where.
2. A slight modification of 1.: for each observation with status 'I' (Investment), I am looking for the closest country (in the firms existing portfolio) in terms of cultural dimensions (pdi, idv, mas, uai) AND subyears. If subyears < 5, then the country should not qualify for being selected as the closest neighbor. In this case the second closest neighbor should be chosen and checked if subyears >= 5. Otherwise the third closest neighbor should be investigated, and so on.
I appreciate your input!
Thanks,
Sandra
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/