thank you michael. one problem that i am facing is that i have no idea how many low response observations were dropped in the initial mailing as i do not have them in the dataset (we receive data from a third party). so it is not possible to determine whether and how the dropped cases differ from the retained observations.
is my intuition correct that because the dropped observations are likely nonresponders, then my estimates are biased upward? that is, students who are less likely to respond are driven into a higher predicted probability of responding?
thank you,
mike
--- On Wed, 3/4/09, Michael I. Lichter <[email protected]> wrote:
> From: Michael I. Lichter <[email protected]>
> Subject: Re: st: Biased estimates?
> To: [email protected]
> Date: Wednesday, March 4, 2009, 8:32 PM
> Mike,
>
> Whether or not you've biased your results by throwing
> out cases depends on whether or not those cases differ
> systematically from the general population of cases. If they
> do, you can (arguably) compensate by giving additional
> weight to cases that are like the ones you dropped. For
> example, if you dropped 100 cases in very low response areas
> but retained 200 cases in moderately low response areas, you
> could give those 200 cases each a weight of 1.5 in your
> analysis.
>
> On the other hand, if you're not mailing to the very
> low response areas in the second round, it was right to
> exclude them from the analysis and there's no bias in
> your predictions for people in areas that have more than
> very low response rates.
>
> In any event, since success is a rare outcome in your study
> (only 5%), you might consider using Gary King's rare
> events logit (-relogit-) available at
> http://GKing.Harvard.Edu.
>
> Michael
>
> Mike Wazowski wrote:
> > Hello Statalisters, I am hoping that somebody can help
> me with the following.
> >
> > I have data on invitations mailed to students to join
> an honor society as well as who responded (joined) the
> society. There are two mailing campaigns: preliminary, to
> about 10% of the eligible students, and then a secondary
> mailing to all the remaining 90% students. Since the
> response rate is low (around 5%), my task is to build a
> predictive model, based on the first round of mailing, of
> who is likely to join so that we can minimize the cost of
> the second mailing.
> >
> > The problem is that the data for the first mailing is
> already purged of "bad" zip codes - those from
> whom the response rate was close to zero in the previous
> year or two (the data for the second round contains all the
> zip codes although I can delete the "bad" ones
> too, if necessary). I am using a logit model to estimate
> coefficients based on the first round and use those for an
> out-of-sample prediction for the probability of response for
> the second round.
> >
> > My question is whether the data purge for the first
> mailing biases the results? I am getting many respondents
> classified as very low probability respondents. Are there
> any statistical procedures to correct for the deletion of
> bad zip codes in the first mailing?
> >
> > Thank you,
> >
> > Mike
> >
> >
> >
> >
> >
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/statalist/faq
> > * http://www.ats.ucla.edu/stat/stata/
> >
>
> -- Michael I. Lichter, Ph.D.
> Research Assistant Professor & NRSA Fellow
> UB Department of Family Medicine / Primary Care Research
> Institute
> UB Clinical Center, 462 Grider Street, Buffalo, NY 14215
> Office: CC 125 / Phone: 716-898-4751 / E-Mail:
> [email protected]
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/