Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Two datasets: Look for similar observations in the second dataset
From
Torsten Häberle <[email protected]>
To
[email protected]
Subject
Re: st: Two datasets: Look for similar observations in the second dataset
Date
Sun, 26 Jan 2014 20:33:05 +0100
Could anybody help with this? This problem is killing me. Maybe Nick,
please? Thanks...
2014-01-25 Torsten Häberle <[email protected]>:
> Hey guys,
>
> I have quite a difficult "matching" problem to solve and I am not sure
> how to approach it. This is the situation:
>
> I have two datasets:
> 1) The first one is my sample dataset
> 2) The second one is basically the entire population, but excluding my
> sample dataset
>
> Both datasets include data about firms. In general, what I want to do:
> Find for each firm in dataset (1) another "matching" firm in dataset
> (2) that is as similar as possible to the sample firm in dataset (1)
> (based on two characteristics).
>
> Dataset 1 looks like:
>
> Company Year CompanySize A ratio
> A 2012 140 0.2
> B 2011 200 0.4
> C 2010 300 0.2
>
> It includes many firms over a period of 20 years including their
> characteristics. There are two matching characteristics: the company
> size and a (company) ratio that I calculated.
> For example, company A has a size of 140 and a ratio of 0.2 in 2012.
> Now, I want to find a firm in dataset (2), which is similar to firm A
> in dataset (1) in the same year 2012.
>
> Dataset 2 looks very similar:
>
> Company Year CompanySize A ratio
> X 2012 150 0.19
> Y 2012 280 0.9
> Z 2012 50 0.01
> ...
>
> Dataset (2) includes many many other firms. As mentioned, I want to
> find a matching firm for each sample firm. This should be somehow
> constructed by a loop or macro (?) I think, but I am not sure.
>
> The match should be conducted in the following way. Let's assume in
> our example that we want to find a matching firm for sample firm A in
> dataset (1).
> 1) Characteristic: CompanySize >> First matching characteristic
> Stata shall pick all firms from dataset (2) that have a company size
> between 80% and 120% of firm A's size. All other firms in dataset (2)
> shall be immediately dismissed. This is basically the first step in
> the matching procedure.
> In our case: Company size is 140 and range 112 - 168. All firms in
> dataset (2) that have a CompanySize of above 168 or below 112 shall be
> dismissed --> Company Y and Z.
>
> 2) Characteristic: Ratio >> Second matching characteristic
> Now, Stata shall pick from the remaining firms in dataset (2) the
> single one firm which has the most similar ratio as firm A from
> dataset (1) has. In our example, this would be Company X. This should
> be done somehow like:
> Ratio firm A dataset (1) - Ratio of firm X dataset (2) = 0.2 - 0.19 = 0.01
> - Ratio of firm Y = 0.4 - 0.9 = - 0.5
> - Ratio of firm Z = 0.2 - 0.01 = 0.19
>>>> Pick firm X since the the difference is the smallest. Be careful here: Y and Z
> are actually already excluded due to their CompanySize (first matching
> characteristic). This
> is just an example.
>
> Finally, to make it even more complicated: I am not only looking for
> the "best" (closest) match, but also for the second and third closest
> match.
>
> In the end, I want to get one dataset that looks like this:
>
> Company Matching Firm 1 Matching Firm 2 MF3
> A X 2nd rank
> 3rd
>
> Hopefully, I made my problem clear. Would appreciate some help. Since
> this matching
> has to be done for every sample firm, this has to be some kind of
> loop/macro that does
> this matching over and over again for every sample firm.
>
> Thanks!
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/