Bill kindly provided the definitive answer.
A totally separate issue from how best to
code this in Stata is why Steve Vaisey
wants in his calculation of entropy to add
ln(#rows * #columns).
I am so familiar with the definition SUM p ln p
that I unconsciously filtered out the additive term in my
various solutions. Where does the extra term
come from? Could Steve or anyone else provide
a reference for this definition please?
Nick
[email protected]
William Gould, Stata
> <[email protected]> asked,
>
> > I am trying to implement the following formula for estimating the
> > (standardized) informational entropy of a RxC contingency table:
> > sum(p*ln(p))+ln(M) where p (subscript ij omitted) is the
> proportion of
> > cases in each cell and M is the total number of cells. So
> far, I've
> > only managed to accomplish the following:
> >
> > . tab var1 var2, matcell(Cell)
> > . matrix Proportions = Cell/r(N)
>
> There have already been lots of answers to this question. I
> particularly
> like the answer by Nick Cox <[email protected]> because he
> used Mata,
> but Nick packaged his solution into an ado-file with a Mata
> subroutine,
> which disguised how easy the solution is.
>
> The entire solution to the problem is
>
> . tab va1 var2, matcell(Cell)
> . mata:
> : M = st_numscalar("r(N)")
> : X = st_matrix("Cell")
> : P = X/M
> : sum(P:*ln(P))+ln(M)
> : end
>
> You can just type the above interactively into Stata. It will work.
>
> In Steve's original question, he wrote
>
> > For some (probably good) reason, while you can easily
> multiply matrix
> > elements by a scalar, you can't do something like:
> >
> > . matrix LnP = ln(Proportions)
>
> That's true, you can't do that in Stata's *OLD* matrix
> language, but you
> can do that in Stata's *NEW* matrix language, Mata.
>
> In the code above, I just did that in-line,
>
> : sum(P:*ln(P))+ln(M)
>
> but I just as well could have typed:
>
> : LnP = ln(P)
> : sum(P:*LnP)+ln(M)
>
> In fact, that is exactly what I did when I tried this
> problem, and I typed
> even more so that I could see intermediate results and make
> sure things were
> going well. Here's the actual log:
>
> ==============================================================
> ================
> . tab var1 var2, matcell(Cell)
>
> Repair |
> Record | var2
> 1978 | 1 2 3 | Total
> -----------+---------------------------------+----------
> 1 | 5 4 1 | 10
> 2 | 8 15 7 | 30
> 3 | 2 10 6 | 18
> 4 | 3 6 2 | 11
> -----------+---------------------------------+----------
> Total | 18 35 16 | 69
>
>
> . mata:
> ------------------------------------------------- mata (type
> end to exit) ----------
> : M = st_numscalar("r(N)")
>
> : M
> 69
>
> : X = st_matrix("Cell")
>
> : X
> 1 2 3
> +----------------+
> 1 | 5 4 1 |
> 2 | 8 15 7 |
> 3 | 2 10 6 |
> 4 | 3 6 2 |
> +----------------+
>
> : P = X/M
>
> : P
> 1 2 3
> +-------------------------------------------+
> 1 | .0724637681 .0579710145 .0144927536 |
> 2 | .115942029 .2173913043 .1014492754 |
> 3 | .0289855072 .1449275362 .0869565217 |
> 4 | .0434782609 .0869565217 .0289855072 |
> +-------------------------------------------+
>
> : lnP = ln(P)
>
> : lnP
> 1 2 3
> +----------------------------------------------+
> 1 | -2.624668592 -2.847812143 -4.234106505 |
> 2 | -2.154664963 -1.526056303 -2.288196356 |
> 3 | -3.540959324 -1.931521412 -2.442347035 |
> 4 | -3.135494216 -2.442347035 -3.540959324 |
> +----------------------------------------------+
>
> : sum(P:*lnP)+ln(M)
> 1.957469762
>
> : end
> ==============================================================
> ================
>
>
> By the way, the strange ":*" in -sum(P:*lnP)+ln(M)- is Mata's
> elementwise
> multiplication operator. P*lnP would be matrix multiplication.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/