Another way to do it is with Mata. That way, zeros
get ignored in the way you would like.
----------------------------------- myentropy.ado
program myentropy, rclass
version 9
syntax varlist(min=2 max=2 numeric) [if] [in] [fweight aweight]
marksample touse
qui count if `touse'
if r(N) == 0 error 2000
tempname matname
tab `varlist' [`weight' `exp'] if `touse', matcell(`matname')
mat `matname' = `matname' / r(N)
mata: subroutine("`matname'")
di as txt "entropy" as res %10.4f r(entropy)
return scalar entropy = r(entropy)
end
mata:
void subroutine(string scalar matname)
{
real matrix X
real scalar H
X = st_matrix(matname)
H = sum(X :* ln(X))
st_numscalar("r(entropy)", H)
}
end
---------------------------------
e.g.
sysuse auto, clear
myentropy for rep78
Nick
[email protected]
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of Nick Cox
> Sent: 28 March 2006 22:37
> To: [email protected]
> Subject: st: RE: Re: manipulating matrix elements
>
>
> A twist to this problem is that it is customary
> in entropy calculations to define 0 ln 0 as 0.
>
> However, Stata doesn't know that and will
> return missing instead. So, it is necessary
> to trap 0 within some expression as
>
> scalar x = x + cond(p = 0, 0, p * ln(p))
>
> Nick
> [email protected]
>
> Michael Blasnik
>
> > -findit matmap- will lead you to the matmap package which
> can perform
> > arbitrary elementwise operations on matrices, but since you
> > want the sum of
> > the calculation across all elements, I think you should just
> > do the looping
> > yourself. Here's one approach (untested):
> >
> > scalar x=0
> > tab var1 var2, matcell(Cell)
> > matrix P = Cell/r(N)
> > local rows=rowsof(P)
> > local cols=colsof(P)
> > forvalues i=1(1)`rows' {
> > forvalues j=1(1)`cols' {
> > scalar p=P[`i',`j']
> > scalar x=x+p*ln(p)
> > }
> > }
> > scalar x=x+ln(`rows'*`cols')
> > di x
> >
> > I've renamed your Proportions matrix as P for brevity and the
> > result you
> > want should be in scalar x when you are done. The code could
> > be made more
> > compact (by not calculating rows, cols, or p separately), but
> > this approach
> > is a little easier to understand. It would be simple to turn
> > this into a
> > Stata program (ado) file that accepts a variable list, -if-
> and -in-
> > qualifiers, and perhaps other command options. I'm not
> > familiar with this
> > statistic, but perhaps someone has already done this.
>
> Steve Vaisey
>
> > > I am trying to implement the following formula for estimating the
> > > (standardized) informational entropy of a RxC contingency table:
> > > sum(p*ln(p))+ln(M) where p (subscript ij omitted) is the
> > proportion of
> > > cases in each cell and M is the total number of cells. So
> > far, I've only
> > > managed to accomplish the following:
> > >
> > > tab var1 var2, matcell(Cell)
> > > matrix Proportions = Cell/r(N)
> > >
> > > Frankly, I'm amazed I've managed even this much, but now
> > I'm stuck. What
> > > I need to do (as I understand it) is take the natural log
> > of each element
> > > of the Proportions matrix, multiply it times its
> > corresponding proportion,
> > > sum up all the elements, and add ln(M).
> > >
> > > For some (probably good) reason, while you can easily
> > multiply matrix
> > > elements by a scalar, you can't do something like:
> > >
> > > matrix LnP = ln(Proportions)
> > >
> > > I've tried this, but it gives a type mismatch error.
> > Again, I'm sure this
> > > is for a good reason, but I'm not sure what to do.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/