Malcolm Wardlaw <[email protected]>
Do you want to keep multiple observations per "event" company on the
"matching" companies (with two ID variables defining companies and
groups of companies)? Or do you want some aggregate measure of
matching companies, such as mean assets? For the latter problem, I
prefer to merge without a matching variable, then loop over
observations as shown at
http://www.stata.com/statalist/archive/2007-01/msg00079.html
"each of these group of sample firms " sounds like the former problem;
a many-to-one almost-nearest-neighbor problem.
You may want -findit nearmrg- or -findit nnmatch-
On Wed, Mar 5, 2008 at 3:13 PM, Malcolm Wardlaw <[email protected]> wrote:
> I wanted to pose this question to Statalist regarding matching data to a
> range of values instead of exact values. I kind of asked this question
> before, but I realized from the response that my question was somewhat
> ill formed, so I'll try to be as explicit as possible. I will use an
> example to illustrate the question.
>
> Let's say I want to do a long-run event study on the changes in real
> growth of companies. In order to do this, I need to appropriately match
> the company I am running the event study on to a group of comparable
> companies. For this, I need a matched dataset of all companies that
> match in a range of accounting variables.
>
> The match occurs as follows. I have a data set (1) containing all of
> the companies I wish to perform the event study on. I need to then
> create a dataset (2) that contains matching companies from a dataset of
> the larger Compustat universe of all firms (3). To do this, I need to
> gather all firms that have the same SIC code, sales that are between 15%
> and -15% of the event company, and assets that are between 20% and -20%
> of the event company in the quarter of the event. The new dataset must
> also have a marker for each of these group of sample firms that
> corresponds to the event firm.
>
> Here is how I originally dealt with the problem. In the program, Stata
> is continually cycling through the data, loading part of another dataset
> into memory, appending it to another dataset from disk, saving that
> dataset to disk, and then reloading the original dataset from disk each
> time. It works, but it seems very inefficient.
>
> Is there a best practice on how to do this, or is this basically as good
> as it's going to get?
>
> ---------------------------------------
> local num = _N
> forval i = 1/`num' {
> /*The sales of Event Company i*/
> local sales=sales[`i']
> /*The quarter of the observation*/
> local qtr=eventquarter[`i']
> /*SIC code*/
> local sic=sic3[`i']
> /*Assets of the event company*/
> local assets=qassets
> /*A code that uniquely tags the event*/
> local code=code[`i']
> quietly:use compustat if `qtr'=obsqtr & `sic'=sic3 &
> qsales<=1.15*`sales'/*
> */ &
> qsales>=.85*`sales'&qassets<=1.2*`assets'&qassets>=.85*`assets', clear
> gen code=`code'
> append using comparables
> quietly:save comparables,replace
> use events
> }
> ---------------------------------------
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/