The family case–control design—in which families are recruited
on the basis of one or more affected members—is becoming an
increasingly popular epidemiological tool for estimating both genetic and
nongenetic effects. A matched case–control analysis using
conditional logistic regression is often applied to estimate the effect of
an exposure on disease, but this approach can lead to underestimates of
associations if unmeasured familial and genetic effects correlated within
family members are ignored. A random-effects conditional logistic
regression model has been proposed, which conditions on both family
ascertainment and familial random effects. In this talk, I will briefly
describe the conditional logistic random-effects model. I will also
describe the development of a new Stata command that will estimate the
parameters of this model.
Additional information
abstracts/ausnz11_muller.pdf
Sometimes you may wish to do something within Stata that Stata currently
does not do. One solution is to run another program within Stata. In this presentation, I will show how to send emails from Stata using another program. Specifically, I look at automatically emailing a log file of an analysis when Stata has
finished running a do file and also emailing the status of an analysis as it
progresses.
I will also show how to merge graphs and log files in
Stata 12 for Windows. Stata 12 allows a log file and graphs to be translated
into PDF but not into one file and only in the order that they are produced.
With the use of a freeware program and some Stata code, I will show how
to circumvent this issue.
Additional information
abstracts/ausnz11_keesman.pdf
Visualizing interactions and response surfaces can be difficult. In this
talk, I will show how to do the former by graphing adjusted means and the
latter by rolling together contour plots. I will demonstrate
this for both linear and nonlinear models.
Additional information
abstracts/ausnz11_rising.pdf
Stata has strong statistical abilities, being widely used around the world
by statisticians in varying disciplines. However, many standard Stata
data-management commands can be easily incorporated into the day-to-day
management of survey sampling. Stata is currently being used by CogNETive
as an integral component in a monthly data-collection study for a major
financial institution. Each month, CogNETive performs an online survey to an
elite group of financial customers regarding their satisfaction with the
introduction of a new online financial system. Stata is used to effectively
manage both the front and back ends of the survey process. The merging and
managing of the email sampling is performed solely by Stata. Each quarter, the
financial institution provides a quarterly transaction file for each
customer to be incorporated into the survey research data and analysis. Many
data-management issues have arisen over the course of the study (for example,
merge conflict), potentially causing significant implications to the
results of the study. A discussion of the processes involved, and tips and
traps for this style of study will be discussed.
Richard J. Woodman
Flinders Centre for Epidemiology and Biostatistics, Discipline of General Practice, Flinders University
Campbell H. Thompson
Susan W. Kim
Flinders Centre for Epidemiology and Biostatistics, Discipline of General Practice, Flinders University
Background
Quantification of the added usefulness of new measures in risk prediction
has traditionally relied upon significance tests from regression models and
increases in the C-statistic. However, significant model predictors often
cause only minor increases in the C-statistic, suggesting limited utility of
the new measures in improving risk prediction. More recently, other
discriminators have gained popularity amongst researchers. The Integrated
Discrimination Improvement index (IDI) measures the difference between the
change in the mean predicted risk of an event occurring for those who had
the event and the change for those who didn’t have the event. The Net
Reclassification Improvement index (NRI) quantifies the percentage of
subjects correctly re-classified in terms of risk.
Methods
A logistic regression model was developed to predict risk of long from short
(<=72 hrs) hospital stay amongst 1,457 general medicine patients.
Significant predictors were age, blood pressure (BP), heart rate (HR),
respiratory rate (RR), mobility, white blood cell count (WBC), cardiac
failure (CF) and the need for supplemental oxygen (SuO
2). Using
the predicted probabilities for long-stay, we assessed improvements in the
C-statistic (ΔC), the IDI (%) and the NRI (%) after the addition of
each variable beyond age. The NRI was assessed using predicted probability
cutpoints for long-stay of 50% and 57% (that is, the overall prevalence of
long-stay patients) and the category-free NRI, which assesses the proportion
of patients with improved prediction probabilities according to their
eventual outcome.
Results
The C-statistic identified HR (ΔC=0.027, p<0.001), mobility
(ΔC=0.024, p<0.001), BP (ΔC=0.01, p=0.002), and WBC (ΔC=0.01,
p=0.003) as measures that significantly increased model discrimination. The
IDI identified the same measures (HR=4.2%, mobility=3.1%, BP=1.2%, and
WBC=1.5%; p<0.001 for each) and additionally RR (0.7%, p<0.001), CF (0.4%,
p<0.05), and SuO
2 (0.3%, p<0.05). The NRI with a 50% cutpoint identified HR
(5.2%, p=0.004), mobility (3.1%, p=0.02), and RR (3.3%, p=0.01), while the
NRI with a 57% cutpoint identified mobility (5.1%, p=0.003), RR (2.4%,
p=0.02), and SuO
2 (2.3%, p=0.006). The category-free NRI
identified HR (21.0%, p<0.001), mobility (24.9%, p<0.001), BP (14.6%,
p<0.001), WBC (8.3%, p=0.02), and RR (8.4%, p=0.03).
Conclusion
The selection of measures to include for the prediction of long hospital
stay differed between model discriminators. The IDI and the category-free
NRI were more sensitive discriminators than was the C-statistic, with both
identifying RR in addition to HR, mobility, BP, and WBC. The IDI also
identified CF and SuO
2. Fewer variables were identified by the
category-dependent NRI than by the C-statistic, and the selected variables also
differed according to the chosen probability cutpoint.
Additional information
abstracts/ausnz11_woodman_etal.ppt