|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
RE: st: gologit2
For anyone who is still following this thread - Maarten and I have
been conversing offline and I think we both agree the original
simulations of the brant test and its counterparts were flawed and
the test works much better than we initially thought. Here are new
simulations. For each simulation, the expected value for reject H0
is .05, i.e. 50 false rejections in 1000 tries using the .05 level of
significance.
Brant Simulations, 1000 reps
# of Xs | reject H0
----------+-----------
1 | .051
2 | .042
3 | .054
4 | .051
5 | .057
6 | .054
7 | .048
8 | .056
9 | .050
10 | .056
12 | .067
14 | .038
16 | .048
18 | .041
20 | .036
Altogether, the observed number of false rejections of the null
totaled 749; the expected number of rejections was 750, so I'd say
the simulations worked pretty good. Most critically, you don't see
an increased tendency to reject the null as the number of x variables
increases.
For anyone who is interested as to what has changed - in Maarten's
original code, as the number of Xs increased, the distribution of Y
also changed. Cases were less likely to be in the middle categories
of 2 and 3 and more likely to be in the extreme categories of 1 and
4. I don't think this would necessarily be bad, except that it does
result in some very thin cell counts in some simulations which
probably results in poor estimation.
I tweaked Maarten's code so that the distribution of Y was always the
same no matter how many Xs were in the model. This seems more
realistic anyway, i.e. the distribution of the observed y shouldn't
change just because you have more explanatory variables. That
produced the results above. Also, Maarten reports that he tweaked
his original code, increasing the sample size, and the brant test
also performed better then.
Here is the modified code.
set more off
set seed 12345
capture program drop sim
program define sim, rclass
syntax, [nx(integer 1)]
drop _all
set obs 500
forvalues i = 1/`nx' {
gen x`i' = invnorm(uniform())
local x `x' x`i'
}
local x : list retokenize x
local xsum : subinstr local x " " " + ", all
gen u = uniform()
gen ystar = `xsum' + ln(u/(1-u))
* original code for y. This code causes
* the distribution of y to change as the
* number of x vars increases, probably
* resulting in very small cell counts in
* some simulations.
*gen y = cond(ystar < -2, 1, ///
* cond(ystar < 0, 2, ///
* cond(ystar < 2, 3, 4)))
* New code for y. Distribution for y is
* the same no matter how many Xs there are.
egen y = cut(ystar), group(4)
replace y = y + 1
* Uncomment the section you want
* brant test code
ologit y `x'
brant
return scalar p = r(p)
* omodel test code
*omodel logit y `x'
*return scalar p = chi2tail($S_2, $S_1)
* gologit2 test code
*ologit y `x'
*est store m1
*gologit2 y `x', npl store(m2)
*lrtest m1 m2, force
*return scalar p = r(p)
end
simulate p=r(p), reps(1000): sim, nx(1)
count if p < .05
matrix res = 1, r(N)/1000
foreach i of numlist 2/10 12(2)20 {
simulate p=r(p), reps(1000): sim, nx(`i')
count if p < .05
matrix res = res \ `i', r(N)/1000
}
matlist res
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/