Thanks to Kit Baum, a new package -firstdigit-
is now available from SSC. Stata 9 is required,
as the program depends on Mata. Use -ssc-
to install if interested.
-firstdigit- tabulates and analyses the first
digits of numeric variables. It also tests
Benford's law that digits d = 1,..,9 occur
with probabilities log10(1 + 1/d). Thus given
data of 12, 345, 6789, etc., it would extract
1, 3, 6, etc., tabulate the frequencies of
the digits 1 to 9 and give a chi-square test
of the law.
Users of Stata 8 may wish to look at -benford-
by Nikos Askitas, also available from SSC
(and revised today).
Alternatively, users of Stata 8 may use -chitest-
from the package -tab_chi-, also available from
SSC, for this purpose. The help details a Benford's
Law example.
Mata users may be interested to see how the main
work goes in Mata:
void fd_work(string scalar varname,
string scalar tousename,
string scalar percent)
{
real colvector y, obs, exp
real scalar n, i, chisq
string scalar name
y = st_data(., varname, tousename)
n = rows(y)
exp = obs = J(9, 1, 0)
y = strtoreal(substr(strofreal(y), 1, 1))
for (i = 1; i <= 9; i++) {
obs[i] = colsum(y :== i)
exp[i] = n * log10(1 + 1/i)
name = "r(obs" + strofreal(i) + ")"
st_numscalar(name,
percent == "" ? obs[i] : 100 * obs[i] / n)
name = "r(exp" + strofreal(i) + ")"
st_numscalar(name,
percent == "" ? exp[i] : 100 * log10(1 + 1/i))
}
chisq = colsum(((obs - exp):^2) :/ exp)
st_numscalar("r(p)", chi2tail(8, chisq))
st_numscalar("r(chisq)", chisq)
st_numscalar("r(N)", n)
}
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/