Thanks to Dave and Steve.
Dave, I am not sure how to apply -xtmelogit- to this data set, or if
this would be a correct thing to do. I haven't worked with this before
but will look into it.
Steve, thanks for your helpful comments. I am not quite sure what is
meant by the two part prediction equation. I think you mean getting
predicted probabilities from a logit model with s315t==1, but am not
sure about 's315t negative: predict clstr = 1'? Can you make this more
explicit?
Thanks much,
John
On Sat, Mar 7, 2009 at 11:43 AM, Steven Samuels
<[email protected]> wrote:
>
> John, your model is probably incorrect. It assumes that, when s315t is 0,
> the other factors make a difference implied by the model form. They don't.
> Correspondingly, the stratified two-way tables indicate a possible
> interaction between s315t and 'east'.
>
> I suggest a two part prediction equation.
>
> s315t negative: predict clstr = 1
> s315t positive: predict with other factors in a logistic model.
>
>
> I'm not very familiar with exact logistic regression, but if the usual rules
> of thumb apply, the 32-33 events (clstr =1) entitle you to about three
> predictors altogether.
>
>
> -Steve
>
> On Mar 6, 2009, at 10:12 PM, john metcalfe wrote:
>
>> Dear Statalist,
>> I am analyzing a small data set with outcome of interest 'clstr', with
>> the primary goal of the analysis to determine if the variables 's315t'
>> and 'east' have independent associations with the outcome. However,
>> 2315t is highly deterministic for the outcome clstr, as below. I am
>> concerned that exact logistic regression is not fully accounting for
>> the small cell bias. I would like to employ a hierarchical logistic
>> regression, but it seems that the stata command 'hireg' is only for
>> linear linear regressions??
>> It may be that I simply am unable to make any valid inferences with
>> this dataset, but I just want to make sure I have explored the
>> appropriate possible remedies.
>> Thanks,
>> John
>>
>> John Metcalfe, M.D., M.P.H.
>> University of California, San Francisco
>>
>>
>> . tab s315 clstr,e
>>
>> | clstr
>> s315t | 0 1 | Total
>> -----------+----------------------+----------
>> 0 | 22 1 | 23
>> 1 | 58 32 | 90
>> -----------+----------------------+----------
>> Total | 80 33 | 113
>>
>> Fisher's exact = 0.002
>> 1-sided Fisher's exact = 0.002
>>
>>
>>
>>
>> . logit clstr ageat s315t east emb sm num,or
>>
>> Iteration 0: log likelihood = -62.686946
>> Iteration 1: log likelihood = -51.860098
>> Iteration 2: log likelihood = -50.754342
>> Iteration 3: log likelihood = -50.661741
>> Iteration 4: log likelihood = -50.660257
>> Iteration 5: log likelihood = -50.660256
>>
>> Logistic regression Number of obs =
>> 100
>> LR chi2(6) =
>> 24.05
>> Prob > chi2 =
>> 0.0005
>> Log likelihood = -50.660256 Pseudo R2 =
>> 0.1919
>>
>>
>> ------------------------------------------------------------------------------
>> clstr | Odds Ratio Std. Err. z P>|z| [95% Conf.
>> Interval]
>>
>> -------------+----------------------------------------------------------------
>> ageatrept | .9908837 .0139884 -0.65 0.517 .9638428
>> 1.018683
>> s315t | 9.238959 10.28939 2.00 0.046 1.041462
>> 81.96011
>> east_asian | 4.219755 2.215279 2.74 0.006 1.508083
>> 11.80727
>> emb | .9964845 .6599534 -0.01 0.996 .2721043
>> 3.649268
>> sm | 2.138175 1.696319 0.96 0.338 .451589
>> 10.12379
>> num_resist | 1.064089 .2385192 0.28 0.782 .6857694
>> 1.651116
>>
>> ------------------------------------------------------------------------------
>>
>>
>>
>> Strategy 1: Two-way contingency tables
>>
>> . tab clstr s315t if east==1,e
>>
>> | s315t
>> clstr | 0 1 | Total
>> -----------+----------------------+----------
>> 0 | 6 19 | 25
>> 1 | 1 24 | 25
>> -----------+----------------------+----------
>> Total | 7 43 | 50
>>
>> Fisher's exact = 0.098
>> 1-sided Fisher's exact = 0.049
>>
>> . tab clstr s315t if east==0,e
>>
>> | s315t
>> clstr | 0 1 | Total
>> -----------+----------------------+----------
>> 0 | 12 33 | 45
>> 1 | 0 8 | 8
>> -----------+----------------------+----------
>> Total | 12 41 | 53
>>
>> Fisher's exact = 0.175
>> 1-sided Fisher's exact = 0.108
>>
>>
>>
>> Strategy 2: Exact Logistic Regression
>>
>> observation 102: enumerations = 1128
>> observation 103: enumerations = 574
>>
>> Exact logistic regression Number of obs = 103
>> Model score = 19.78112
>> Pr >= score = 0.0000
>>
>> ---------------------------------------------------------------------------
>> clstr | Odds Ratio Suff. 2*Pr(Suff.) [95% Conf. Interval]
>>
>> -------------+-------------------------------------------------------------
>> s315t | 10.44218 32 0.0135 1.391627 474.4786
>> east_asian | 5.414021 25 0.0006 1.933718 16.65417
>>
>>
>>
>>
>> (output omitted)
>> observation 103: enumerations = 574
>>
>> Exact logistic regression Number of obs = 103
>> Model score = 19.78112
>> Pr >= score = 0.0000
>>
>> ---------------------------------------------------------------------------
>> clstr | Coef. Score Pr>=Score [95% Conf. Interval]
>>
>> -------------+-------------------------------------------------------------
>> s315t | 2.345854 6.763266 0.0129 .3304732 6.162216
>> east_asian | 1.688992 12.98631 0.0004 .6594448 2.812661
>>
>> ---------------------------------------------------------------------------
>>
>>
>> Strategy 3: Hierarchical Regression
>>
>> . hireg clstr (s315t) (east)(ageat emb sm)
>>
>> Model 1:
>> Variables in Model:
>> Adding : s315t
>>
>> Source | SS df MS Number of obs =
>> 113
>> -------------+------------------------------ F( 1, 111) =
>> 9.18
>> Model | 1.7840879 1 1.7840879 Prob > F =
>> 0.0030
>> Residual | 21.578744 111 .194403099 R-squared =
>> 0.0764
>> -------------+------------------------------ Adj R-squared =
>> 0.0680
>> Total | 23.3628319 112 .208596713 Root MSE =
>> .44091
>>
>>
>> ------------------------------------------------------------------------------
>> clstr | Coef. Std. Err. t P>|t| [95% Conf.
>> Interval]
>>
>> -------------+----------------------------------------------------------------
>> s315t | .3120773 .1030162 3.03 0.003 .1079438
>> .5162108
>> _cons | .0434783 .0919364 0.47 0.637 -.1386999
>> .2256565
>>
>> ------------------------------------------------------------------------------
>>
>> Model 2:
>> Variables in Model: s315t
>> Adding : east
>>
>> Source | SS df MS Number of obs =
>> 103
>> -------------+------------------------------ F( 2, 100) =
>> 12.03
>> Model | 4.34936038 2 2.17468019 Prob > F =
>> 0.0000
>> Residual | 18.0778241 100 .180778241 R-squared =
>> 0.1939
>> -------------+------------------------------ Adj R-squared =
>> 0.1778
>> Total | 22.4271845 102 .219874358 Root MSE =
>> .42518
>>
>>
>> ------------------------------------------------------------------------------
>> clstr | Coef. Std. Err. t P>|t| [95% Conf.
>> Interval]
>>
>> -------------+----------------------------------------------------------------
>> s315t | .2817301 .1086887 2.59 0.011 .0660947
>> .4973654
>> east_asian | .3247109 .0843486 3.85 0.000 .1573656
>> .4920561
>> _cons | -.0669987 .1023736 -0.65 0.514 -.270105
>> .1361075
>>
>> ------------------------------------------------------------------------------
>> R-Square Diff. Model 2 - Model 1 = 0.118 F(1,100) = 14.190 p = 0.000
>>
>> Model 3:
>> Variables in Model: s315t east
>> Adding : ageat emb sm
>>
>> Source | SS df MS Number of obs =
>> 100
>> -------------+------------------------------ F( 5, 94) =
>> 4.72
>> Model | 4.36538233 5 .873076466 Prob > F =
>> 0.0007
>> Residual | 17.3946177 94 .185049124 R-squared =
>> 0.2006
>> -------------+------------------------------ Adj R-squared =
>> 0.1581
>> Total | 21.76 99 .21979798 Root MSE =
>> .43017
>>
>>
>> ------------------------------------------------------------------------------
>> clstr | Coef. Std. Err. t P>|t| [95% Conf.
>> Interval]
>>
>> -------------+----------------------------------------------------------------
>> s315t | .2335983 .1163422 2.01 0.048 .0025981
>> .4645984
>> east_asian | .2694912 .0945411 2.85 0.005 .0817777
>> .4572048
>> ageatrept | -.0012444 .0024199 -0.51 0.608 -.0060491
>> .0035603
>> emb | .0396897 .0989203 0.40 0.689 -.1567189
>> .2360984
>> sm | .1063985 .1087626 0.98 0.330 -.1095522
>> .3223492
>> _cons | -.0454117 .1512602 -0.30 0.765 -.3457423
>> .254919
>>
>> ------------------------------------------------------------------------------
>> R-Square Diff. Model 3 - Model 2 = 0.007 F(3,94) = 0.029 p = 0.993
>>
>>
>> Model R2 F(df) p R2 change F(df) change
>> p
>> 1: 0.076 9.177(1,111) 0.003
>> 2: 0.194 12.030(2,100) 0.000 0.118 14.190(1,100)
>> 0.000
>> 3: 0.201 4.718(5,94) 0.001 0.007 0.029(3,94)
>> 0.993
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/