Hello Statalisters, I am hoping that somebody can help me with the following.
I have data on invitations mailed to students to join an honor society as well as who responded (joined) the society. There are two mailing campaigns: preliminary, to about 10% of the eligible students, and then a secondary mailing to all the remaining 90% students. Since the response rate is low (around 5%), my task is to build a predictive model, based on the first round of mailing, of who is likely to join so that we can minimize the cost of the second mailing.
The problem is that the data for the first mailing is already purged of "bad" zip codes - those from whom the response rate was close to zero in the previous year or two (the data for the second round contains all the zip codes although I can delete the "bad" ones too, if necessary). I am using a logit model to estimate coefficients based on the first round and use those for an out-of-sample prediction for the probability of response for the second round.
My question is whether the data purge for the first mailing biases the results? I am getting many respondents classified as very low probability respondents. Are there any statistical procedures to correct for the deletion of bad zip codes in the first mailing?
Thank you,
Mike
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/