Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: distribution test


From   Maarten Buis <[email protected]>
To   [email protected]
Subject   st: Re: distribution test
Date   Thu, 1 Sep 2011 10:32:55 +0200

On Wed, Aug 31, 2011 at 6:46 PM, Smets Lodewijk wrote:
> I'm not planning to use the variable - the total number of World
> Bank policy loans - as a dependent variable in a survival analysis,
> but rather as a regressor in a fixed effects panel regression. I'd
> like to test for its univariate distribution as to get an idea whether
> extreme observations are really "extreme". That is, if a ksmirnov
> test indicates that observations for that variable seem to be
> drawn from an exponential distribution, this could provide
> justification of not dropping "extreme" observations from the
> sample.

I do not think that a Kolmogorov-Smirnov test is appropriate for this
purpose, as it tends to be particularly sensitive to deviations at the
center of the distribution while you care about what is happening at
the tails. You may have noticed that this discussion continued on the
statalist. The most appropriate suggestion in that thread is to look
at the QQ-plot together with a porfolio of scenarios assuming that the
data comes from a exponential distribution with the estimated
parameter as discussed here:
<http://www.stata.com/statalist/archive/2011-08/msg01412.html>. The
QQ-plot focuses on the tail of the distribution, it shows individual
observations, and the comparison with the scenarios helps you
determine whether or not the deviations are too extreme or not.

> In the meantime I found another way of getting a value for
> lamda. I've used the  - dpplot - command which reports the
> mean (and consequently the lambda).

I have nothing against -dpplot-, but that is as much overkill as using
-hangroot- to get the mean. The way to obtain the mean of a variable
in Stata is to use -sum- and look at r(mean). You can see which
results are left behind after a command by typing -return list- (and
-ereturn list- after estimation commands).

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany


http://www.maartenbuis.nl
--------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index