|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Range Merging
Malcolm,
As you are aware, the inefficiency comes because you are churning
datasets. You may be able to avoid this by putting the "events" data
into a matrix, then doing the matching with the "compustat" data
current throughout, something like this (Stata 9 approach, better
endowed people would probably use Mata):
use events, clear // or whatever it's called
loc num=_N
set matsize `num' // if you have lots of companies
mkmat sales eventquarter sic3 qassets code, mat(E)
drop _all // clears data but not macros or matrices
use compustat
gen code=.
forval i = 1/`num' {
local sales=E[`i',1]
local qtr=E[`i',2]
local sic=E[`i',3]
local assets=E[`i',4]
local code=E[`i',5]
replace code=`code' if ... // your match criteria
}
drop if code==.
save comparables
This does not yet achieve quite what your code does, because it
allows each compustat entry to match only one of your companies (the
last found) whereas your code allows multiple matches (you can get
around that using -expand-, may need two passes). Also it requires
all the matching variables to be numeric: if they aren't, you may
need to -encode- them. So there's work to do, but I think the basic
idea is ok.
Keith
At 07:13 AM 6/03/2008, you wrote:
I wanted to pose this question to Statalist regarding matching data
to a range of values instead of exact values. I kind of asked this
question before, but I realized from the response that my question
was somewhat ill formed, so I'll try to be as explicit as
possible. I will use an example to illustrate the question.
Let's say I want to do a long-run event study on the changes in real
growth of companies. In order to do this, I need to appropriately
match the company I am running the event study on to a group of
comparable companies. For this, I need a matched dataset of all
companies that match in a range of accounting variables.
The match occurs as follows. I have a data set (1) containing all
of the companies I wish to perform the event study on. I need to
then create a dataset (2) that contains matching companies from a
dataset of the larger Compustat universe of all firms (3). To do
this, I need to gather all firms that have the same SIC code, sales
that are between 15% and -15% of the event company, and assets that
are between 20% and -20% of the event company in the quarter of the
event. The new dataset must also have a marker for each of these
group of sample firms that corresponds to the event firm.
Here is how I originally dealt with the problem. In the program,
Stata is continually cycling through the data, loading part of
another dataset into memory, appending it to another dataset from
disk, saving that dataset to disk, and then reloading the original
dataset from disk each time. It works, but it seems very inefficient.
Is there a best practice on how to do this, or is this basically as
good as it's going to get?
---------------------------------------
local num = _N
forval i = 1/`num' {
/*The sales of Event Company i*/
local sales=sales[`i']
/*The quarter of the observation*/
local qtr=eventquarter[`i']
/*SIC code*/
local sic=sic3[`i']
/*Assets of the event company*/
local assets=qassets
/*A code that uniquely tags the event*/
local code=code[`i']
quietly:use compustat if `qtr'=obsqtr & `sic'=sic3 &
qsales<=1.15*`sales'/*
*/ &
qsales>=.85*`sales'&qassets<=1.2*`assets'&qassets>=.85*`assets', clear
gen code=`code'
append using comparables
quietly:save comparables,replace
use events
}
---------------------------------------
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
Dr Keith B.G. Dear
Senior Fellow in Biostatistics
National Centre for Epidemiology and Population Health
Australian National University
Canberra, ACT 0200, Australia
Tel: 02 612 54865, Fax: 02 612 50740
http://nceph.anu.edu.au/Staff_Students/staff_pages/dear.php
CRICOS provider #00120C
http://canberragliding.org/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/