Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Re: st: Need Kullback–Leiber divergence measure
From
Tirthankar Chakravarty <[email protected]>
To
[email protected]
Subject
st: Re: st: Need Kullback–Leiber divergence measure
Date
Sat, 8 May 2010 15:21:34 +0530
<>
Although the user-written -multgof- (Jeroen Weesie, SSC) will do this
for you, it is pretty easy to do this by yourself, following excellent
advice from William Gould here:
http://www.stata-journal.com/sjpdf.html?articlenum=pr0024
I am assuming you want to find the divergence between the frequencies
of a two-way tabulation:
*********************************************
clear*
sysuse auto, clear
tabulate rep78 foreign, matcell(newmat)
mata
// Kullback-Leibler divergence
vP1 = st_matrix("newmat")[.,1]:/
sum(st_matrix("newmat")[.,1])
vP2 = st_matrix("newmat")[.,2] :/
sum(st_matrix("newmat")[.,2])
dKLdiv = sum(vP1:*log(vP1:/vP2))
// Kullback-Leibler symmetric divergence
dKLSdiv = 0.5*(dKLdiv+ sum(vP2:*log(vP2:/vP1)))
// Jensen-Shannon divergence
dJSdiv = sum(vP1:*log(vP1:/(0.5*(vP1+vP2)))) +
sum(vP2:*log(vP2:/(0.5*(vP1+vP2))))
dKLdiv, dKLSdiv, dJSdiv
end
*********************************************
See however, this discussion on the Matlab lists about handling zero
probabilities in discrete-valued distributions.
http://groups.google.com/group/comp.soft-sys.matlab/browse_thread/thread/d921b346db0ef427/296b3ab1a09e62a3
http://www.mathworks.com/matlabcentral/fileexchange/13089
Note that -multgof- will refuse to handle this case for you:
*********************************************
svmatf , mat(newmat) fil(newmat.dta)
use newmat, clear
multgof c1 c2, kl
*********************************************
using -svmatf- due Jan Brogger (SSC).
T
2010/5/8 Michael C. Morrison <[email protected]>:
> I've searched Stata (with no success) for "KullbackLeiber divergence" also
> known as the information number, discrimination function, and “distance.”
>
> It's used to measure the divergence between two distributions.
>
> Any help would be appreciated.
>
> Mike
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
To every ω-consistent recursive class κ of formulae there correspond
recursive class signs r, such that neither v Gen r nor Neg(v Gen r)
belongs to Flg(κ) (where v is the free variable of r).
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/