The Pareto distribution is typically defined by the cdf F(x;a) = 1 -
x^(-a) where a>0 for x>=0 and zero elsewhere, and the pdf f(x;a) =
ax^(-a-1) for x>=0 and zero elsewhere. A version with two parameters
is given by F(x;a,k) = 1-(x/k)^(-a) and f(x; a,k) = (a/k)(x/k)^(-a-1)
= a(k)^(a)(x)^(-a-1).
On a log-log plot, the density function for the Pareto distribution is
a straight line:
ln f(x) = ($B!](Ba $B!](B 1) ln x + a ln k + ln a.
This suggests a means for estimating parameters a and k by
constructing kernel density estimates of f(x), and regressing
ln(\hat{f(x)}) on ln(x). Standard errors could presumably be obtained
via bootstrap.
Since
ln f(x) = $B!](B ln x $B!](B ln $B-u(B2$B&P&R(B $B!](B(ln x $B!](B $B&L(B)^2 /2$B&R(B^2
ln f(x) = $B!](B (ln x)^2/ 2$B&R(B^2 + ( $B&L&R(B^(-2) $B!](B 1)ln x $B!](B ln $B-u(B2$B&P&R(B $B!](B $B&L(B^2/2$B&R(B^2 .
a regression of ln(\hat{f(x)}) on ln(x) and (ln x)^2 should have a
zero coefficient on (ln x)^2 if x is distributed Pareto, and a
negative coefficient if it is distributed lognormal. The trouble is
of course that a lognormal with a fairly large $B&R(B^2 will be very hard
to distinguish from a Pareto, since the negative coefficient will be
quite close to zero.
Does this test of Pareto versus lognormal distributions make sense?
Is anyone aware of an implementation of this? I would be happy to
write it up as a Stata package if not.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/