| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Re: RE: Re: RE: RE: IQR
It seems to be a 'common' practice when COMPUSTAT
data is used. The dataset is composed by the balance sheet
reports of US firms. It would be difficult to identify in the
data mergers, splits or any sort of change in property that
implies a huge change in the composicion of a firm (in terms
of assets, fixed capital, etc.) then dropping extreme values
in change in assets allows you to 'delete' the unexplained
firms. Also, a similar problem affects the price where
sometime a change in the dividend policy can produce a
jump that makes sense only when the researcher knows
the change in policy. Usually, researchers do not know
about these policies or it is a titatic (and maybe useless)
job trying to include them in the analysis.
Rodrigo.
----- Original Message -----
From: "Nick Cox" <[email protected]>
To: <[email protected]>
Sent: Thursday, June 07, 2007 6:44 AM
Subject: st: RE: Re: RE: RE: IQR
I am shocked to find my good friend Kit Baum throwing
away 20% of his data. No doubt this profligacy
matches his research problem. In environmental science,
which I know more about,
throwing out the tails would lose all the bangs and leave
mostly whimpers, but he is doing economics, where some
of the extreme values may represent accountancy artefacts.
On -iqr-, since half the work is done, perhaps there is
a case for a formal update. I will contact the author,
Larry Hamilton, whose book's various editions have
served so many Stata users so well. (It got me started.)
But -iqr-'s main function I see as reporting. Rodrigo's
example of a -foreach- loop cycling over variables
and -summarize- results is the way to go for selection of subsets
of data.
Nick
[email protected]
Rodrigo A. Alfaro
///
Wow Nick, your translation from 'demotic' can be only
compared with the work
of Thomas Young. Just kidding, very good job indeed!!
Returning to the
problem, it would be nice to get a list of return scalars in your new
version. For the problem, the limits were the observations
are supposed to
be outliers can be used after for sample selection or to create new
variables.
Alternative to the procedure discussed so far, there is
another way to
'deal' with the outlier (if you want to), which is cutting
the tails "we
trimmed firms whose total assets growth rate exceed the 90th
percentile or
fall short of the 10th percentile of the annual
distribution." page 6 of
Baum, Caglayan, Ozkan (2003), Working Paper 566, Boston College. For
example, tdavis could use the following code to 'drop' the
outliers that are
above of 5th and 95th percentile of each variable:
foreach x of varlist price total_assets inventories {
gen double `x'_wo = `x'
sum `x', d
local u = r(p95)
local l = r(p5)
replace `x'_wo = . if `x'>`u'
replace `x'_wo = . if `x'<`l'
}
From: "Nick Cox" <[email protected]>
To: <[email protected]>
Sent: Wednesday, June 06, 2007 6:53 PM
Subject: st: RE: RE: IQR
>I spent a while updating -iqr- to -iqr8-.
>
> This was unnecessary, because -iqr- works
> fine under version control. (How many programs
> would run without change in other software after 16 years?)
> Nevertheless, few Stata users will now be accustomed to reading
> or writing Stata like this:
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/