Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: High-dimensional product/sum combinations in likelihood calculations
From
Matthew Baker <[email protected]>
To
[email protected]
Subject
st: High-dimensional product/sum combinations in likelihood calculations
Date
Thu, 27 Oct 2011 11:45:57 -0400
Dear Listers --
I'm curious if anyone has insight into the following problem in which
in the course of a likelihood computation. The problem is one caused
by a requirement that probabilities be computed and then averaged
before being inserted into a likelihood function. To take an example,
consider grouped data in which there are some amount of groups, and
the mean in each group is chosen to be either m1 or m2 with some
probability p. The group log-likelihood will then be something like
L[g]=ln( p*prob(x|m1)+(1-p)*prob(x|m2)). The problem is that the
probabilities inside the log can be extremely small in problems where
the group sizes are large. To take an example:
/* Begin Example */
clear all
set seed 8675309
/* Begin by generating fictional data with 20 groups and 10000 total
observations */
/* with probability .4, the group mean is 1, while with prob .6, the
group mean is 1.5 */
mata
groups=20
obs=10000
p=.4
m1=1
m2=1.5
G=1:+sort(round(runiform(obs,1)*(groups-1)),1)
id=panelsetup(G,1)
Data=J(obs,1,1)
for (i=1;i<=rows(id);i++) {
D=panelsubmatrix(G,i,id)
if (runiform(1,1)>p) Data[id[i,1]::id[i,2],.]=rnormal(rows(D),1,m1,1)
else Data[id[i,1]::id[i,2],.]=rnormal(rows(D),1,m2,1)
}
/* Now, write a function that evaluates the log-likelihood of the data */
real colvector obj_fun_2(b,x,G)
{
real scalar i,p,m1,m2
real matrix id,D1,D2
p=b[1]
m1=b[2]
m2=b[3]
dens1=normalden(x:-m1)
dens2=normalden(x:-m2)
id=panelsetup(G,1)
LL=J(rows(id),1,.)
for (i=1;i<=rows(id);i++) {
D1=panelsubmatrix(dens1,i,id)
D2=panelsubmatrix(dens2,i,id)
P1=exp(colsum(ln(D1)))
P2=exp(colsum(ln(D2)))
LL[i]=p*P1+(1-p)*P2
}
return(ln(LL))
}
/* All is not well, as LL has a series of blank entries */
obj_fun_2((p,m1,m2),Data,G)
end
/* End example */
The final command issued in Mata above returns the result, which I am
fairly certain is occurring due to small numbers:
+----------------+
1 -406.0532458
2 .
3 .
4 .
5 .
6 .
7 .
8 .
9 .
10 .
11 -668.8969238
12 .
13 .
14 .
15 .
16 .
17 .
18 .
19 .
20 -372.8149279
+----------------+
Does anyone have any ideas about dealing with these sorts of problems?
Thanks in advance for the help!
Matt Baker
--
Dr. Matthew J. Baker
Department of Economics
Hunter College and the Graduate Center, CUNY
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/