Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Two datasets: Look for similar observations in the second dataset

From	Torsten Häberle <[email protected]>
To	[email protected]
Subject	st: Two datasets: Look for similar observations in the second dataset
Date	Sat, 25 Jan 2014 18:26:55 +0100

Hey guys,

I have quite a difficult "matching" problem to solve and I am not sure
how to approach it. This is the situation:

I have two datasets:
1) The first one is my sample dataset
2) The second one is basically the entire population, but excluding my
sample dataset

Both datasets include data about firms. In general, what I want to do:
Find for each firm in dataset (1) another "matching" firm in dataset
(2) that is as similar as possible to the sample firm in dataset (1)
(based on two characteristics).

Dataset 1 looks like:

Company    Year       CompanySize    A ratio
A                  2012        140                    0.2
B                  2011        200                   0.4
C                  2010        300                    0.2

It includes many firms over a period of 20 years including their
characteristics. There are two matching characteristics: the company
size and a (company) ratio that I calculated.
For example, company A has a size of 140 and a ratio of 0.2 in 2012.
Now, I want to find a firm in dataset (2), which is similar to firm A
in dataset (1) in the same year 2012.

Dataset 2 looks very similar:

Company    Year       CompanySize    A ratio
X                  2012        150                    0.19
Y                  2012        280                   0.9
Z                  2012        50                      0.01
...

Dataset (2) includes many many other firms. As mentioned, I want to
find a matching firm for each sample firm. This should be somehow
constructed by a loop or macro (?) I think, but I am not sure.

The match should be conducted in the following way. Let's assume in
our example that we want to find a matching firm for sample firm A in
dataset (1).
1) Characteristic: CompanySize >> First matching characteristic
Stata shall pick all firms from dataset (2) that have a company size
between 80% and 120%  of firm A's size. All other firms in dataset (2)
shall be immediately dismissed. This is basically the first step in
the matching procedure.
In our case: Company size is 140 and range 112 - 168. All firms in
dataset (2) that have a CompanySize of above 168 or below 112 shall be
dismissed --> Company Y and Z.

2) Characteristic: Ratio >> Second matching characteristic
Now, Stata shall pick from the remaining firms in dataset (2) the
single one firm which has the most similar ratio as firm A from
dataset (1) has. In our example, this would be Company X. This should
be done somehow like:
Ratio firm A dataset (1) - Ratio of firm X dataset (2) = 0.2 - 0.19 = 0.01
                                      - Ratio of firm Y = 0.4 - 0.9 = - 0.5
                                      - Ratio of firm Z = 0.2 - 0.01 = 0.19
>>> Pick firm X since the the difference is the smallest. Be careful here: Y and Z
are actually already excluded due to their CompanySize (first matching
characteristic). This
is just an example.

Finally, to make it even more complicated: I am not only looking for
the "best" (closest) match, but also for the second and third closest
match.

In the end, I want to get one dataset that looks like this:

Company        Matching Firm 1     Matching Firm 2      MF3
A                        X                                   2nd rank
       3rd

Hopefully, I made my problem clear. Would appreciate some help. Since
this matching
has to be done for every sample firm, this has to be some kind of
loop/macro that does
this matching over and over again for every sample firm.

Thanks!
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Two datasets: Look for similar observations in the second dataset
  - From: Torsten Häberle <[email protected]>

Prev by Date: Re: st: Kendalls tau-b difference between ktau and tabulate taub
Next by Date: Re: st: Kendalls tau-b difference between ktau and tabulate taub
Previous by thread: st: Kendalls tau-b difference between ktau and tabulate taub
Next by thread: Re: st: Two datasets: Look for similar observations in the second dataset
Index(es):
- Date
- Thread