Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: distribution test
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: distribution test
Date
Tue, 30 Aug 2011 10:13:14 +0100
In addition to Maarten's good advice, and setting aside the question
of covariates, not mentioned in the original, at least as quoted here:
There is a direct method to check for fit to an exponential
distribution: a quantile-quantile plot. See -qexp- (SSC) and/or
SJ-7-2 gr0027 . . Stata tip 47: Quantile-quantile plots without programming
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q2/07 SJ 7(2):275--279 (no commands)
tip on producing various quantile-quantile (Q-Q) plots
on how to do it for yourself. The latter is directly accessible at
http://www.stata-journal.com/sjpdf.html?articlenum=gr0027
In addition to the usual objections to a test which separates you from
the data and reduces it to one bit of information, a test like
Kolmogorov-Smirnov based on the difference between cumulatives is
necessarily most sensitive to differences between middles of
distributions. Whether that is the right way round for data analysis
is an open question.
By the way, it is not usually confusing, but some people parameterise
exponential distributions using the mean and some the reciprocal of
the mean. Which prevails is partly down to tribal habits and partly
down to how you want to think of the parameter, literally as a mean or
as some kind of rate with reciprocal units. Maarten's assuming the
latter.
It is sometimes more revealing to consider exponentials in comparison
with some wider family (e.g. gamma, Weibull), but there are so many to
choose from!
Nick
On Tue, Aug 30, 2011 at 9:43 AM, Maarten Buis <[email protected]> wrote:
> -- Lodewijk Smets wrote me privately:
>> In Stata, I'd like to test the fit of my data with an exponential
>> distribution. I'm thinking of using a Kolgomorov-Smirnov test
>> (are there better alternatives?). Yet, the ks-test requires me
>> to define lambda. I've noticed that you're the author of the
>> -hangroot- command, where parameters are estimated (in
>> order to compare an empirical distribution with a theoretical
>> one). So I was wondering if there's a way to retrieve that
>> estimation of the parameter, i.e. is it stored somewhere?
>
> -hangroot- is for the rather special situation that you want to
> compare the distribution of one variable with one univariate
> distribution, i.e. there are no explanatory/x/right hand
> side/independent variables. I am working on a generalization, but it
> is not yet finished. There is however a deadline, as I will be
> presenting it at the 2011 Nordic and Baltic Stata Users Group meeting
> on Friday, November 11, 2011
> (<http://www.stata.com/meeting/sweden11/>). Moreover, the exponential
> distribution often occurs with survival data, and -hangroot- is not
> (and will not be) designed for survival data, in particular it will
> not handle right censoring.
>
> Having stated those limitations, the estimate of lambda in the
> univariate non-survival case is pretty easy as the maximum likelihood
> estimate has a closed form solution: 1/mean. You could use -hangroot-
> to recover this estimate: it is returned in r(lambda), but that is a
> bit overkill. It is probably easier to use -sum- to compute the mean
> and and transform that to the estimate of lambda. Below I have added
> an example:
>
> *------------- begin example ---------------
> // create some exponential data
> local lambda = 2
> drop _all
> set obs 500
> gen y = -1/`lambda'*ln(1-runiform())
>
> // estimate parameter
> sum y, meanonly
> local lambdahat = 1/r(mean)
> di as txt "ML estimate of lambda is: " ///
> as result `lambdahat'
>
> // ksmirnov test
> ksmirnov y = 1-exp(-`lambdahat'*y)
>
> // hanging rootogram
> hangroot y, dist(exponential) ci
> return list
> *--------------- end example ---------------
> (For more on examples I sent to the Statalist see:
> http://www.maartenbuis.nl/example_faq )
>
> If you have explanatory variables, the problem than is that there is
> no longer one lambda but each observation has its own lambda. So the
> marginal distribution of your explained/y/left hand side/dependent
> variable no longer follows an exponential distribution but a mixture
> of exponential distributions with different lambdas. To the best of my
> knowledge no test has been implemented that will test the marginal
> distribution of your explained variable against this mixture
> distribution. As I said above I am working on a graphical comparison
> of these two distributions, and I might add such a test for some
> models. However, if I do so I will probably include a warning in the
> helpfile not to use it for the following three reasons: 1) The
> preferred outcome is that we cannot find a significant deviation from
> the theoretical distribution. However, such non-significance only
> indicates "absence of evidence", which should not be confused with
> "evidence of absence", especially since such test of distributions
> tend to have little power, i.e. they are not very likely to detect
> deviations when they should. 2) Even if we find significant deviations
> from the theoretical distribution, that does not tell is what those
> deviations are and what to do about them. 3) We are testing whether a
> model is true, but a good model is a simplification of reality, i.e.
> the model is not supposed to be true. A good model involves an
> informed tradeoff between how well the model simplifies reality and
> how large the deviations are between model and reality (approximated
> by the observations). To make this more difficult, not all deviations
> are equally relevant, and which ones are most relevant depends on the
> purpose of the model. So no generic/automatic/computerized method can
> exist to do this tradeoff for us, which is good, as that means that
> our jobs won't be replaced by computes any time soon. This also means
> that the tradeoff implicit in a statistical test is typically not the
> right tradeoff for determining whether a model is appropriate or not.
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/