From: Michael I. Lichter <[email protected]>
Subject: Re: st: Biased estimates?
To: [email protected]
Date: Wednesday, March 4, 2009, 8:32 PM
Mike,
Whether or not you've biased your results by throwing
out cases depends on whether or not those cases differ
systematically from the general population of cases. If they
do, you can (arguably) compensate by giving additional
weight to cases that are like the ones you dropped. For
example, if you dropped 100 cases in very low response areas
but retained 200 cases in moderately low response areas, you
could give those 200 cases each a weight of 1.5 in your
analysis.
On the other hand, if you're not mailing to the very
low response areas in the second round, it was right to
exclude them from the analysis and there's no bias in
your predictions for people in areas that have more than
very low response rates.
In any event, since success is a rare outcome in your study
(only 5%), you might consider using Gary King's rare
events logit (-relogit-) available at
http://GKing.Harvard.Edu.
Michael
Mike Wazowski wrote:
Hello Statalisters, I am hoping that somebody can help
me with the following.
I have data on invitations mailed to students to join
an honor society as well as who responded (joined) the
society. There are two mailing campaigns: preliminary, to
about 10% of the eligible students, and then a secondary
mailing to all the remaining 90% students. Since the
response rate is low (around 5%), my task is to build a
predictive model, based on the first round of mailing, of
who is likely to join so that we can minimize the cost of
the second mailing.
The problem is that the data for the first mailing is
already purged of "bad" zip codes - those from
whom the response rate was close to zero in the previous
year or two (the data for the second round contains all the
zip codes although I can delete the "bad" ones
too, if necessary). I am using a logit model to estimate
coefficients based on the first round and use those for an
out-of-sample prediction for the probability of response for
the second round.
My question is whether the data purge for the first
mailing biases the results? I am getting many respondents
classified as very low probability respondents. Are there
any statistical procedures to correct for the deletion of
bad zip codes in the first mailing?
Thank you,
Mike
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
-- Michael I. Lichter, Ph.D.
Research Assistant Professor & NRSA Fellow
UB Department of Family Medicine / Primary Care Research
Institute
UB Clinical Center, 462 Grider Street, Buffalo, NY 14215
Office: CC 125 / Phone: 716-898-4751 / E-Mail:
[email protected]
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/