Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Nearest neighbor distance
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Nearest neighbor distance
Date
Wed, 24 Aug 2011 08:19:55 +0100
-nearest- is a user-written program from SSC. You are asked to
identify where user-written programs you refer to come from.
Your problem is both similar to and very different from that of
-nearest- and you would need to rewrite -nearest-. I wouldn't call
that a slight modification.
-nearest- is indifferent to ties and whether nearest neighours are
reflexive, i.e. A is the nearest neighbour of B, and also vice versa.
These could be bigger issues with your kind of data.
I have no idea what the Kogut and Singh index is.
Nick
On Tue, Aug 23, 2011 at 6:27 PM, Lange, Sandra <[email protected]> wrote:
> I would like to modify the code of the stata command 'nearest' to identify the closest neighbor (from a defined set of observations) for specific observations in a panel data set.
> I work with an unbalanced sample of firms which ranges over a time period of about 20 years.
> The dataset contains the portfolio of subsidiaries of each firm in each year and consists of over 100,000 observations (one observation = subsidiary of a firm in one year). In addition, several country characteristics were merged into the dataset. Below you find an excerpt to get an impression of how the data looks like:
> firm_id unit_id year status country countryname pdi idv mas uai subyears nearest nearest_id
> 100 15 1990 U 215 Japan 54 46 95 92 2
> 100 44 1990 I 235 Russia 93 39 36 95 0
> 100 4 1990 U 404 Belgium 65 75 54 94 3
> 100 46 1990 I 408 Germany 35 67 66 65 0
> 100 18 1990 U 408 Germany 35 67 66 65 4
> 100 2 1990 U 408 Germany 35 67 66 65 4
> 100 38 1990 I 434 Switzerland 34 68 70 58 0
> 100 15 1991 U 215 Japan 54 46 95 92 3
> 100 44 1991 U 235 Russia 93 39 36 95 1
> 100 4 1991 U 404 Belgium 65 75 54 94 4
> 100 46 1991 U 408 Germany 35 67 66 65 7
> 100 18 1991 U 408 Germany 35 67 66 65 7
> 100 2 1991 U 408 Germany 35 67 66 65 7
> 100 38 1991 U 434 Switzerland 34 68 70 58 1
> 100 54 1991 I 429 Poland 68 60 64 93 0
> 100 53 1991 I 429 Poland 68 60 64 93 0
> 100 51 1991 I 430 Portugal 63 27 31 104 0
> . . . . . . . . . . . ...
> 101 181 1985 U 215 Japan 54 46 95 92 1
> 101 150 1985 U 236 Saudi-Arabia 80 38 52 68 1
> 101 146 1985 U 237 Singapur 74 20 48 8 1
> 101 140 1985 U 404 Belgium 65 75 54 94 2
> 101 155 1985 U 408 Germany 35 67 66 65 3
> 101 83 1985 U 408 Germany 35 67 66 65 3
> 101 84 1985 U 408 Germany 35 67 66 65 3
> 101 133 1985 U 411 France 68 71 43 86 2
> 101 147 1985 U 411 France 68 71 43 86 2
> 101 222 1985 I 438 Spain 34 51 42 86 0
> . . . . . . . . . . .
>
> More precisely, this is what I would like to do:
>
> 1. for each observation with status 'I' (Investment), I am looking for the closest country in terms of cultural dimensions (pdi, idv, mas, uai) in the firms existing portfolio (observations with status 'U'). I suppose I could use the code for 'nearest'; however, I probably would have to change it slightly, because the 'nearest' command finds the closest neighbor in N; however, I am looking for the closest neighbor in _n which should be somehow specified as the existing portfolio (all subsidiary-year observations with status ==U).
> - Is it possible to modify the code of the command 'nearest' for that in the first place? Does someone have a suggestion?
> - How should I deal with the fact that I have multiple dimensions in the code of the command 'nearest'? I want to use the Kogut&Singh index for calculating
> the distance based on these four dimensions. At some point I would have to indicate that, but I do not know where.
>
> 2. A slight modification of 1.: for each observation with status 'I' (Investment), I am looking for the closest country (in the firms existing portfolio) in terms of cultural dimensions (pdi, idv, mas, uai) AND subyears. If subyears < 5, then the country should not qualify for being selected as the closest neighbor. In this case the second closest neighbor should be chosen and checked if subyears >= 5. Otherwise the third closest neighbor should be investigated, and so on.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/