[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Distribution fitting curiosity

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	st: Distribution fitting curiosity
Date	Fri, 22 Dec 2006 19:45:02 -0000

Here is something that tickled me. The tickling 
arises mostly because it is so easy to do in Stata. 

In a book on extreme values by Enrique Castillo 
and friends, 

Extreme Value and Related Models with Applications in Engineering and Science
by Enrique Castillo, Ali S. Hadi, N. Balakrishnan, Jose M. Sarabia
Wiley 2005,  

I read about "quantile least squares": fit a distribution 
(i.e. estimate its parameters) 
by minimising the sum of squared differences between observed
and theoretical quantiles. 

For that you need a plotting position -- there are several 
recipes, but I will use (i - 0.5) / n for illustration -- 
and a specification of the quantile function, a.k.a. inverse
distribution function. Then you can use -nl- as your 
optimisation engine. In my brief experience, you do often
need to specify initial values. Here the example is a two-parameter
gamma in a common parameterisation. 

. sysuse auto
. sort mpg

. gen p = (_n - 0.5) / _N

. nl (mpg = {beta} * invgammap({alpha}, p)) , nolog initial(alpha 1 beta 1)
(obs = 74)

      Source |       SS       df       MS
-------------+------------------------------         Number of obs =        74
       Model |  35959.8251     2  17979.9126         R-squared     =    0.9987
    Residual |  48.1748656    72  .669095355         Adj R-squared =    0.9986
-------------+------------------------------         Root MSE      =  .8179825
       Total |       36008    74  486.594595         Res. dev.     =  178.2401

------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       /beta |    1.59419   .0527887    30.20   0.000     1.488957    1.699422
      /alpha |   13.35188    .453658    29.43   0.000     12.44753    14.25623
------------------------------------------------------------------------------
* (SEs, P values, CIs, and correlations are asymptotic approximations)

At this point, many of you are thinking, on a variety of quasi-theological
grounds, that this is not the proper way to do it. Why not use maximum 
likelihood? And in this case, -gammafit- from SSC does the same thing using
ML: 

. gammafit mpg , nolog

ML fit of two-parameter gamma distribution        Number of obs   =         74
                                                  Wald chi2(0)    =          .
Log likelihood = -229.81792                       Prob > chi2     =          .

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
alpha        |
       _cons |   14.85135   2.414625     6.15   0.000     10.11877    19.58393
-------------+----------------------------------------------------------------
beta         |
       _cons |   1.434031   .2371325     6.05   0.000     .9692599    1.898802
------------------------------------------------------------------------------

At this point, the quantile LS method must be declared "quick and dirty", but 
"quick" is often good, and "dirty" may be tolerable if no immediate alternative is
available. The R^2 you get with -nl- is fabulous, but unsurprising given the 
nature of the exercise. Also, the t statistics are wildly exaggerated, I surmise
because of the dependence structure in the complete set of order statistics. 
Thus -nl- is here a blind optimiser rather than a statistical command. But 
other tweaks are possible, and the predicted values can be obtained readily
for a quantile-quantile plot. 

Nick 
[email protected] 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: RE: st: label graphs
Next by Date: st: clogit: interacting chooser characteristics with choice characteristics
Previous by thread: st: xtabond and vce(bootstrap) option
Next by thread: Re: st: Distribution fitting curiosity
Index(es):
- Date
- Thread