Here is something that tickled me. The tickling
arises mostly because it is so easy to do in Stata.
In a book on extreme values by Enrique Castillo
and friends,
Extreme Value and Related Models with Applications in Engineering and Science
by Enrique Castillo, Ali S. Hadi, N. Balakrishnan, Jose M. Sarabia
Wiley 2005,
I read about "quantile least squares": fit a distribution
(i.e. estimate its parameters)
by minimising the sum of squared differences between observed
and theoretical quantiles.
For that you need a plotting position -- there are several
recipes, but I will use (i - 0.5) / n for illustration --
and a specification of the quantile function, a.k.a. inverse
distribution function. Then you can use -nl- as your
optimisation engine. In my brief experience, you do often
need to specify initial values. Here the example is a two-parameter
gamma in a common parameterisation.
. sysuse auto
. sort mpg
. gen p = (_n - 0.5) / _N
. nl (mpg = {beta} * invgammap({alpha}, p)) , nolog initial(alpha 1 beta 1)
(obs = 74)
Source | SS df MS
-------------+------------------------------ Number of obs = 74
Model | 35959.8251 2 17979.9126 R-squared = 0.9987
Residual | 48.1748656 72 .669095355 Adj R-squared = 0.9986
-------------+------------------------------ Root MSE = .8179825
Total | 36008 74 486.594595 Res. dev. = 178.2401
------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
/beta | 1.59419 .0527887 30.20 0.000 1.488957 1.699422
/alpha | 13.35188 .453658 29.43 0.000 12.44753 14.25623
------------------------------------------------------------------------------
* (SEs, P values, CIs, and correlations are asymptotic approximations)
At this point, many of you are thinking, on a variety of quasi-theological
grounds, that this is not the proper way to do it. Why not use maximum
likelihood? And in this case, -gammafit- from SSC does the same thing using
ML:
. gammafit mpg , nolog
ML fit of two-parameter gamma distribution Number of obs = 74
Wald chi2(0) = .
Log likelihood = -229.81792 Prob > chi2 = .
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
alpha |
_cons | 14.85135 2.414625 6.15 0.000 10.11877 19.58393
-------------+----------------------------------------------------------------
beta |
_cons | 1.434031 .2371325 6.05 0.000 .9692599 1.898802
------------------------------------------------------------------------------
At this point, the quantile LS method must be declared "quick and dirty", but
"quick" is often good, and "dirty" may be tolerable if no immediate alternative is
available. The R^2 you get with -nl- is fabulous, but unsurprising given the
nature of the exercise. Also, the t statistics are wildly exaggerated, I surmise
because of the dependence structure in the complete set of order statistics.
Thus -nl- is here a blind optimiser rather than a statistical command. But
other tweaks are possible, and the predicted values can be obtained readily
for a quantile-quantile plot.
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/