[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Has -tabulate , lrchi2- changed in Release 10?

From	"Joseph Coveney" <[email protected]>
To	"Statalist" <[email protected]>
Subject	st: Has -tabulate , lrchi2- changed in Release 10?
Date	Mon, 21 Jan 2008 23:27:01 +0900

The do-file below creates a cross-tabulation with an empty cell (zero-count
cell) and yet -tabulate , lrchi2- returns a result.  I don't remember Stata
doing that in earlier releases?  Has this changed in Stata 10, or is it just
forgetfulness as usual?

The documentation still says, "lrchi2 displays the likelihood-ratio
chi-squared statistic.  The request is ignored if any cell of the table
contains no observations." and the formula given in the manual still has the
chi-square test statistic calculated as 2 * sum_i sum_j ln(n_ij / m_ij).

On the same topic, recently, I've encountered a dataset where -tabulate- has
a couple of zero-count cells and the likelihood ratio chi-square yields a
p-value (0.304) that gives a substantially different picture from those of
the Pearson chi-square test (0.044) and the Fisher test (0.068).  The
do-file below includes a contrived example where the opposite is the case
(P = 0.02 for the likelihood ratio test, and P > 0.05 for the other two).
What should we make of the likelihood ratio test here?

Analogously, what to make of -logit- compared to -logit , asis-, where the
likelihood-ratio tests differ because the null-model log-likelihoods differ
(reflecting dropped perfect predictors)?  I wasn't able to attend the London
users' group meeting in 2006; did the discussion following Ian White's
presentation ( http://repec.org/usug2006/White.ppt ) conclude with any
guidance?  Was there any discussion (pro or con) about Firth's method
(Slide 19 of Ian's presentation) in cases of zero-count cells?

Sometimes I can follow advice given in Chapter 4 of D. W. Hosmer & S
Lemeshow, _Applied Logistic Regression_ Second Edition. (New York: John
Wiley & Sons, 2000).  But in other cases, collapsing or omitting categories,
or fudging an ordering to the categories isn't compatible with the objective
of the analysis.  In a recent case, because of scientific interest in the
interaction of two categorical predictors, I would have liked to
use -exlogistic- to handle zero-count cells, but ran out of memory.  Should
I just continue to raise memory allocation to Stata and -exlogistic-?  I'm
not sure how much trouble disc swapping might become.

I suppose that I could use resampling, as well:  does anyone have any
recommendations for choosing between a likelihood-ratio test or a Wald test
for the chi-square returned by the program called by -permute-?

Joseph Coveney

clear *
set seed `=date("2008-01-21", "YMD")'
set obs 10
generate byte A = mod(_n, 5)
generate byte B = mod(_n, 2)
generate byte count = floor(uniform() * 100)
replace count = 0 in 1
tabulate A B [fweight = count], lrchi2
drop in 1
tabulate A B [fweight = count], lrchi2
quietly xi: logit B i.A [fweight = count], asis
display in smcl as text " likelihood-ratio chi2(" ///
 as result e(df_m) as text ") =" ///
 as result %9.4f e(chi2) as text "   Pr = " ///
 as result %05.3f chi2tail(e(df_m), e(chi2))
quietly replace count = floor(count / 1.5)
tabulate A B [fweight = count], ///
 lrchi2 chi2 exact nolog // N > 200; which one?
exit


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: hall's skewness adjustment
Next by Date: st: re: endogenous variables
Previous by thread: st: AW: Comma-delimited output
Next by thread: st: re: endogenous variables
Index(es):
- Date
- Thread