[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Biased estimates?

From	"Michael I. Lichter" <mlichter@buffalo.edu>
To	statalist@hsphsun2.harvard.edu
Subject	Re: st: Biased estimates?
Date	Wed, 04 Mar 2009 15:32:33 -0500

Mike,

Whether or not you've biased your results by throwing out cases dependson whether or not those cases differ systematically from the generalpopulation of cases. If they do, you can (arguably) compensate by givingadditional weight to cases that are like the ones you dropped. Forexample, if you dropped 100 cases in very low response areas butretained 200 cases in moderately low response areas, you could givethose 200 cases each a weight of 1.5 in your analysis.

On the other hand, if you're not mailing to the very low response areasin the second round, it was right to exclude them from the analysis andthere's no bias in your predictions for people in areas that have morethan very low response rates.

In any event, since success is a rare outcome in your study (only 5%),you might consider using Gary King's rare events logit (-relogit-)available at http://GKing.Harvard.Edu.

Michael

Mike Wazowski wrote:

Hello Statalisters, I am hoping that somebody can help me with the following.

I have data on invitations mailed to students to join an honor society as well as who responded (joined) the society.  There are two mailing campaigns: preliminary, to about 10% of the eligible students, and then a secondary mailing to all the remaining 90% students.  Since the response rate is low (around 5%), my task is to build a predictive model, based on the first round of mailing, of who is likely to join so that we can minimize the cost of the second mailing.

The problem is that the data for the first mailing is already purged of "bad" zip codes - those from whom the response rate was close to zero in the previous year or two (the data for the second round contains all the zip codes although I can delete the "bad" ones too, if necessary).  I am using a logit model to estimate coefficients based on the first round and use those for an out-of-sample prediction for the probability of response for the second round.

My question is whether the data purge for the first mailing biases the results?  I am getting many respondents classified as very low probability respondents. Are there any statistical procedures to correct for the deletion of bad zip codes in the first mailing?

Thank you,

Mike

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

--
Michael I. Lichter, Ph.D.
Research Assistant Professor & NRSA Fellow
UB Department of Family Medicine / Primary Care Research Institute
UB Clinical Center, 462 Grider Street, Buffalo, NY 14215
Office: CC 125 / Phone: 716-898-4751 / E-Mail: mlichter@buffalo.edu

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: Biased estimates?
  - From: "Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu>
- Re: st: Biased estimates?
  - From: Mike Wazowski <mike.wazowski@ymail.com>

References:
- st: Biased estimates?
  - From: Mike Wazowski <mike.wazowski@ymail.com>

Prev by Date: Re: st: 64 bit MySQL 5.1 ODBC driver for Mac OS 10.5.6
Next by Date: Re: st: Biased estimates?
Previous by thread: st: Biased estimates?
Next by thread: Re: st: Biased estimates?
Index(es):
- Date
- Thread