|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Has -tabulate , lrchi2- changed in Release 10?
From |
"Joseph Coveney" <[email protected]> |
To |
"Statalist" <[email protected]> |
Subject |
st: Has -tabulate , lrchi2- changed in Release 10? |
Date |
Mon, 21 Jan 2008 23:27:01 +0900 |
The do-file below creates a cross-tabulation with an empty cell (zero-count
cell) and yet -tabulate , lrchi2- returns a result. I don't remember Stata
doing that in earlier releases? Has this changed in Stata 10, or is it just
forgetfulness as usual?
The documentation still says, "lrchi2 displays the likelihood-ratio
chi-squared statistic. The request is ignored if any cell of the table
contains no observations." and the formula given in the manual still has the
chi-square test statistic calculated as 2 * sum_i sum_j ln(n_ij / m_ij).
On the same topic, recently, I've encountered a dataset where -tabulate- has
a couple of zero-count cells and the likelihood ratio chi-square yields a
p-value (0.304) that gives a substantially different picture from those of
the Pearson chi-square test (0.044) and the Fisher test (0.068). The
do-file below includes a contrived example where the opposite is the case
(P = 0.02 for the likelihood ratio test, and P > 0.05 for the other two).
What should we make of the likelihood ratio test here?
Analogously, what to make of -logit- compared to -logit , asis-, where the
likelihood-ratio tests differ because the null-model log-likelihoods differ
(reflecting dropped perfect predictors)? I wasn't able to attend the London
users' group meeting in 2006; did the discussion following Ian White's
presentation ( http://repec.org/usug2006/White.ppt ) conclude with any
guidance? Was there any discussion (pro or con) about Firth's method
(Slide 19 of Ian's presentation) in cases of zero-count cells?
Sometimes I can follow advice given in Chapter 4 of D. W. Hosmer & S
Lemeshow, _Applied Logistic Regression_ Second Edition. (New York: John
Wiley & Sons, 2000). But in other cases, collapsing or omitting categories,
or fudging an ordering to the categories isn't compatible with the objective
of the analysis. In a recent case, because of scientific interest in the
interaction of two categorical predictors, I would have liked to
use -exlogistic- to handle zero-count cells, but ran out of memory. Should
I just continue to raise memory allocation to Stata and -exlogistic-? I'm
not sure how much trouble disc swapping might become.
I suppose that I could use resampling, as well: does anyone have any
recommendations for choosing between a likelihood-ratio test or a Wald test
for the chi-square returned by the program called by -permute-?
Joseph Coveney
clear *
set seed `=date("2008-01-21", "YMD")'
set obs 10
generate byte A = mod(_n, 5)
generate byte B = mod(_n, 2)
generate byte count = floor(uniform() * 100)
replace count = 0 in 1
tabulate A B [fweight = count], lrchi2
drop in 1
tabulate A B [fweight = count], lrchi2
quietly xi: logit B i.A [fweight = count], asis
display in smcl as text " likelihood-ratio chi2(" ///
as result e(df_m) as text ") =" ///
as result %9.4f e(chi2) as text " Pr = " ///
as result %05.3f chi2tail(e(df_m), e(chi2))
quietly replace count = floor(count / 1.5)
tabulate A B [fweight = count], ///
lrchi2 chi2 exact nolog // N > 200; which one?
exit
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/