Jonathan Golub asked about how to perform logistic regression of a set of data
apparently gathered in a case-control study as if it came from a cohort study.
---------------------------begin excerpt----------------------------------------
I have analyzed my dataset as a case-control study (via logistic
regression), and I am now interested in analyzing it as a case-cohort
study. . . . Please tell me how it is possible to set my data so that
it can be analyzed as a case-cohort design, via logistic regression.
The problem I am having is that in order to use the entire population
(including cases) as my controls I need to define a case-control
variable that equals 1 for case and 0 for control, but all cases are
also controls.
-----------------------------end excerpt----------------------------------------
I'm no epidemiologist, but given my albeit limited understanding of the conditioning of
the selection of cases and controls in a case-control study verus that in a typical cohort
study, I'm not so sure that what Jonathan wants to do can be done legitimately. Others
with greater insight can comment on the following compromise proposal, but perhaps
Jonathan could independently determine the malady's prevalence in the entire
population and then randomly select from the cases and controls in the dataset in
proportion to the prevalence.
Joseph Coveney
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/