I am a design-based inference guy, I know too much of survey
statistics and too little of anything else :)). So here are my two
design-based cents.
If you had say 5000 people with z=1 all sampled, and out of 5000
remaining z=0 people, 3000 were sampled, I would just treat those as
strata with differential probabilities of selection:
Pr[selection|z=1]=1, Pr[selection|z=0]=3/5, so the pweight to go along
with the first group is 1, while the weight to go along with the
second group is 5/3=1.667. That should actually be about the same
reweighting idea that Austin suggested originally.
There is literature on an area that would seem to be related to your
problem, the population-based case-control studies, that takes the
problem to the extreme: it is the dependent variable itself that is
used as a criteria for sampling. Usually this applies to rare
diseases, when all the cases are taken into the data set
(Prob[selection]=1, weight=1, and controls are sampled from population
(Prob[selection] is a tiny number, weight = 1e5 or something like
that). The interest is often in probability of having the disease
conditional on some covariates, and miraculously enough you can
estimate this model using maximum likeihood without weights -- the
only parameter that will be biased is the intercept. Alastair Scott
from New Zealand is the guy who knows all about it; see
http://www.citeulike.org/user/ctacmo/article/1036969.
On 10/8/08, [email protected]
<[email protected]> wrote:
> Thank you for the advice. Very helpful!
>
> In this spesific case z is a dummy, and if z=1 then this will increase the likelihood of observing x=1. And yes, I do observe outcomes for the group that was supposed to be treated, but were not.
>
> Best wishes,
> Alexander
>
> -----Opprinnelig melding-----
> Fra: [email protected] [mailto:[email protected]] På vegne av Austin Nichols
> Sendt: 8. oktober 2008 18:39
> Til: [email protected]
> Emne: Re: st: Imbalance in control versus treated group, and weights
>
>
> It is possible that some kind of propensity score reweighting or regression discontinuity design would be appropriate here, but without much more information, it is hard to offer any specific advice. How does z affect x in the group supposed to have x=1? Do you observe outcomes for the group supposed to have x=1 but having x=0? Etc.
>
> Running a probit with the assumption E(y)=F(b0+b1*x+b2*z) seems unlikely to recover a good estimate of the effect of x on y unless that assumption is actually true!
>
> On Wed, Oct 8, 2008 at 12:23 PM, <[email protected]> wrote:
> > Dear Statalisters,
> >
> > I have the following problem. I have given a sample of 10000 people as targets for receiving an offer, and I have a control group equal to 5000 people. I know that the potentially treated and the controlgroup is representative. However, without my knowledge only 8000 of the 10000 targets were treated, and a specific criteria was used to pick those 8000 from the 10000.
> >
> > This has created an imbalance between my controlgroup and those treated, and this imbalance is identified and only concerns one variable. I want to investigate whether the offer given could reduce the defection rate of customers, but the variable that created this imbalance is known to hugely impact the defection rate. To reduce this problem I would like to use weights in Stata, but I am unsure on how to approach this? Any tips would be greatly appreciated.
> >
> > Also, say that I did not correct for this, and did the following probit model with the following variables, y=defected/not defected, x=treated/control, z=factor that created imbalance:
> > y=b0+b1*x+b2*z
> > would it be appropriate to say that it was possible to control for the imbalance by including it as a independent variable in this fashion?
> >
> > Best wishes,
> > Alexander Severinsen
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/