Thanks for the references. I found the second reference
once I had worked out that the volume number is 107.
Martin does, as you say, use the quantity
SUM p ln p + ln #cells
In fields I know a bit about, it is more common
to use
- SUM p ln p = H
as a basic quantity. This is what is used in
my program -ineq- on SSC, for example.
Also, if this H is based on K categories, it can vary
between 0 and ln K, so a simple scaling is H / ln K.
(In the limiting case of a single category with p = 1,
you have to trap the 0 / 0 calculation.) There is
no assumption or approximation in this.
I am not clear that this is what you doing, but no
matter.
Looking at my little program, it is easy to generalise
it so that it can take one variable or two. This is me
modifying the program so it does things I sometimes
want to do, no more.
*! 1.0.0 NJC 30 March 2006
program myentropy, rclass
version 9
syntax varlist(min=1 max=2) [if] [in] [fweight aweight]
marksample touse
qui count if `touse'
if r(N) == 0 error 2000
tempname matname
tab `varlist' [`weight' `exp'] if `touse', matcell(`matname')
mat `matname' = `matname' / r(N)
mata: subroutine("`matname'")
di
di as txt "entropy " as res %7.4f r(entropy)
di as txt "scaled [0,1] " as res %7.4f r(scaled)
return scalar entropy = r(entropy)
return scalar scaled = r(scaled)
end
mata:
void subroutine(string scalar matname)
{
real matrix X
real scalar H
X = st_matrix(matname)
H = -sum(X :* ln(X))
scaled = H == 0 ? 0 : H / ln(rows(X) * cols(X))
st_numscalar("r(entropy)", H)
st_numscalar("r(scaled)", scaled)
}
end
Nick
[email protected]
Steve Vaisey
> I just checked the archives and saw Nick's question about the
> additive
> element. The reference for this is:
>
> Martin, John Levi. 1999. "Entropic Measures of Belief System
> Constraint." Social Science Research 28:111-134.
>
> A simpler and perhaps more useful exposition is given in:
>
> -----. 2002. "Power, Authority, and the Constraint of Belief
> Systems."
> American Journal of Sociology 197:861-904.
>
> Adding the ln(M) [where M = number of cells] is meant to
> standardize the
> entropy (actually, the negative of the entropy) so that it
> varies from 0
> to 1 (i.e., from more to less entropic). This
> standardization assumes,
> however, that no (or few) cells have particularly low counts.
> As such,
> it's really only useful in large-n applications. But since
> that's what
> I have, that's OK.
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/