Dear Statalist,
I am analyzing a small data set with outcome of interest 'clstr', with
the primary goal of the analysis to determine if the variables 's315t'
and 'east' have independent associations with the outcome. However,
2315t is highly deterministic for the outcome clstr, as below. I am
concerned that exact logistic regression is not fully accounting for
the small cell bias. I would like to employ a hierarchical logistic
regression, but it seems that the stata command 'hireg' is only for
linear linear regressions??
It may be that I simply am unable to make any valid inferences with
this dataset, but I just want to make sure I have explored the
appropriate possible remedies.
Thanks,
John
John Metcalfe, M.D., M.P.H.
University of California, San Francisco
. tab s315 clstr,e
| clstr
s315t | 0 1 | Total
-----------+----------------------+----------
0 | 22 1 | 23
1 | 58 32 | 90
-----------+----------------------+----------
Total | 80 33 | 113
Fisher's exact = 0.002
1-sided Fisher's exact = 0.002
. logit clstr ageat s315t east emb sm num,or
Iteration 0: log likelihood = -62.686946
Iteration 1: log likelihood = -51.860098
Iteration 2: log likelihood = -50.754342
Iteration 3: log likelihood = -50.661741
Iteration 4: log likelihood = -50.660257
Iteration 5: log likelihood = -50.660256
Logistic regression Number of obs
= 100
LR chi2(6)
= 24.05
Prob > chi2
= 0.0005
Log likelihood = -50.660256 Pseudo R2
= 0.1919
------------------------------------------------------------------------------
clstr | Odds Ratio Std. Err. z P>|z| [95% Conf.
Interval]
-------------
+----------------------------------------------------------------
ageatrept | .9908837 .0139884 -0.65 0.517 .9638428
1.018683
s315t | 9.238959 10.28939 2.00 0.046 1.041462
81.96011
east_asian | 4.219755 2.215279 2.74 0.006 1.508083
11.80727
emb | .9964845 .6599534 -0.01 0.996 .2721043
3.649268
sm | 2.138175 1.696319 0.96 0.338 .451589
10.12379
num_resist | 1.064089 .2385192 0.28 0.782 .6857694
1.651116
------------------------------------------------------------------------------
Strategy 1: Two-way contingency tables
. tab clstr s315t if east==1,e
| s315t
clstr | 0 1 | Total
-----------+----------------------+----------
0 | 6 19 | 25
1 | 1 24 | 25
-----------+----------------------+----------
Total | 7 43 | 50
Fisher's exact = 0.098
1-sided Fisher's exact = 0.049
. tab clstr s315t if east==0,e
| s315t
clstr | 0 1 | Total
-----------+----------------------+----------
0 | 12 33 | 45
1 | 0 8 | 8
-----------+----------------------+----------
Total | 12 41 | 53
Fisher's exact = 0.175
1-sided Fisher's exact = 0.108
Strategy 2: Exact Logistic Regression
observation 102: enumerations = 1128
observation 103: enumerations = 574
Exact logistic regression Number of obs
= 103
Model score =
19.78112
Pr >= score =
0.0000
---------------------------------------------------------------------------
clstr | Odds Ratio Suff. 2*Pr(Suff.) [95% Conf.
Interval]
-------------
+-------------------------------------------------------------
s315t | 10.44218 32 0.0135 1.391627
474.4786
east_asian | 5.414021 25 0.0006 1.933718
16.65417
(output omitted)
observation 103: enumerations = 574
Exact logistic regression Number of obs
= 103
Model score =
19.78112
Pr >= score =
0.0000
---------------------------------------------------------------------------
clstr | Coef. Score Pr>=Score [95% Conf.
Interval]
-------------
+-------------------------------------------------------------
s315t | 2.345854 6.763266 0.0129 .3304732
6.162216
east_asian | 1.688992 12.98631 0.0004 .6594448
2.812661
---------------------------------------------------------------------------
Strategy 3: Hierarchical Regression
. hireg clstr (s315t) (east)(ageat emb sm)
Model 1:
Variables in Model:
Adding : s315t
Source | SS df MS Number of obs
= 113
-------------+------------------------------ F( 1, 111)
= 9.18
Model | 1.7840879 1 1.7840879 Prob > F
= 0.0030
Residual | 21.578744 111 .194403099 R-squared
= 0.0764
-------------+------------------------------ Adj R-squared
= 0.0680
Total | 23.3628319 112 .208596713 Root MSE
= .44091
------------------------------------------------------------------------------
clstr | Coef. Std. Err. t P>|t| [95% Conf.
Interval]
-------------
+----------------------------------------------------------------
s315t | .3120773 .1030162 3.03 0.003 .
1079438 .5162108
_cons | .0434783 .0919364 0.47 0.637 -.
1386999 .2256565
------------------------------------------------------------------------------
Model 2:
Variables in Model: s315t
Adding : east
Source | SS df MS Number of obs
= 103
-------------+------------------------------ F( 2, 100)
= 12.03
Model | 4.34936038 2 2.17468019 Prob > F
= 0.0000
Residual | 18.0778241 100 .180778241 R-squared
= 0.1939
-------------+------------------------------ Adj R-squared
= 0.1778
Total | 22.4271845 102 .219874358 Root MSE
= .42518
------------------------------------------------------------------------------
clstr | Coef. Std. Err. t P>|t| [95% Conf.
Interval]
-------------
+----------------------------------------------------------------
s315t | .2817301 .1086887 2.59 0.011 .
0660947 .4973654
east_asian | .3247109 .0843486 3.85 0.000 .
1573656 .4920561
_cons | -.0669987 .1023736 -0.65 0.514 -.
270105 .1361075
------------------------------------------------------------------------------
R-Square Diff. Model 2 - Model 1 = 0.118 F(1,100) = 14.190 p =
0.000
Model 3:
Variables in Model: s315t east
Adding : ageat emb sm
Source | SS df MS Number of obs
= 100
-------------+------------------------------ F( 5, 94)
= 4.72
Model | 4.36538233 5 .873076466 Prob > F
= 0.0007
Residual | 17.3946177 94 .185049124 R-squared
= 0.2006
-------------+------------------------------ Adj R-squared
= 0.1581
Total | 21.76 99 .21979798 Root MSE
= .43017
------------------------------------------------------------------------------
clstr | Coef. Std. Err. t P>|t| [95% Conf.
Interval]
-------------
+----------------------------------------------------------------
s315t | .2335983 .1163422 2.01 0.048 .
0025981 .4645984
east_asian | .2694912 .0945411 2.85 0.005 .
0817777 .4572048
ageatrept | -.0012444 .0024199 -0.51 0.608 -.
0060491 .0035603
emb | .0396897 .0989203 0.40 0.689 -.
1567189 .2360984
sm | .1063985 .1087626 0.98 0.330 -.
1095522 .3223492
_cons | -.0454117 .1512602 -0.30 0.765 -.
3457423 .254919
------------------------------------------------------------------------------
R-Square Diff. Model 3 - Model 2 = 0.007 F(3,94) = 0.029 p = 0.993
Model R2 F(df) p R2 change F(df)
change p
1: 0.076 9.177(1,111) 0.003
2: 0.194 12.030(2,100) 0.000 0.118
14.190(1,100) 0.000
3: 0.201 4.718(5,94) 0.001 0.007
0.029(3,94) 0.993
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/