Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: Missing data on outcome and sample selection bias

From	"Lachenbruch, Peter" <[email protected]>
To	"'[email protected]'" <[email protected]>
Subject	RE: st: RE: Missing data on outcome and sample selection bias
Date	Tue, 2 Mar 2010 10:32:29 -0800

Retaining the cases with missing y variables allows "better" imputation of the x variables per van Hippel.  If you omit them, the prediction model may be biased since there may be some relationship that we don't understand.  However, when we are estimating the regression relationship, the inclusion of imputed y's doesn't add anything.  Hence van Hippel's recommendation to impute using all data but omit the y variables for the regression.  It is only a small improvement, but if you have a lot of missing y variables, you can be in a lot of trouble.

Yulia Marchenko (?) at Stata may be able to help you also.

Tony

Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Rosie Chen
Sent: Monday, March 01, 2010 8:32 AM
To: [email protected]
Subject: Re: st: RE: Missing data on outcome and sample selection bias

Thanks, Tony. Let me see if I understand you correctly. Did you mean that, by keeping cases that have missing values on the y variable in the imputation process, we should be able to reduce or remove the possible sample selection bias issue because the imputed x variables' values are based on those cases also? I haven't seen anywhere that this is a standard way to do to address the possible sample selection issue, but please correct me if I am wrong. 


To keep this discussion thread going, I am posting my questions again. Thanks for every input and advice!  -- Rosie

Dear all,  here are my several questions regarding a multilevel analysis with missing values on the outcome variable:

1)     Do we often compare the deleted cases with the
final raw sample without missing data imputation or with the final
sample with missing cases imputed? 
(2) To what extent do t-tests can be useful for determining sample
selection bias? What criterion do we use? Do the significant t tests on
all predictors indicate such a problem or half of the tests being
significant indicates the problem?
(3) 
  If t-test is not a very good tool to assess the problem, should we
use Heckman method? 
Can we use Heckman test to detect and remedy the
possible sample selection bias problem with a dependent variable in
Stata? 
I learned that there is a Heckman and a GLLMM syntax in Stata,
but I am not sure if it can incorporate all three features (multilevel data structure,
multiple-imputed data, and complex survey design) into consideration.


----- Original Message ----
From: "Lachenbruch, Peter" <[email protected]>
To: "[email protected]" <[email protected]>
Sent: Mon, March 1, 2010 11:12:09 AM
Subject: st: RE: Missing data on outcome and sample selection bias

I don't understand why you can't impute outcome variables.  ICE will do it.  A recent paper by van Hippel notes that a reasonable approach is to impute all the missing values but then delete the cases with missing y-values.   His simulations were for normal variables, but I wouldn't be surprised to see they held for categorical ones.  
Deleting cases without y values is often very dangerous.  I'd use ICE and try it both ways.   Note that ICE will impute categorical values.  

Tony

Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Rosie Chen
Sent: Monday, March 01, 2010 7:03 AM
To: [email protected]
Subject: st: Missing data on outcome and sample selection bias

Carlo, thanks for your response. My question is not related to right censoring or independent variables' missing cases. It is the fact that respondents did not answer the question for the outcome variable. We can't impute outcome values, so that's why we often have to delete cases that have missing values on the dependent variable. But there is a potential sample selection bias. 

So dear all,  here are my several questions regarding a multilevel analysis with missing values on the outcome variable:

1)     Do we often compare the deleted cases with the
final raw sample without missing data imputation or with the final
sample with missing cases imputed? 
(2) To what extent do t-tests can be useful for determining sample
selection bias? What criterion do we use? Do the significant t tests on
all predictors indicate such a problem or half of the tests being
significant indicates the problem?
(3)     If t-test is not a very good tool to assess the problem, should we use Heckman method? Can we use Heckman test to detect and remedy the possible sample selection bias problem with a dependent variable in Stata? I learned that there is a Heckman and a GLLMM syntax in Stata, but I am
not sure if it can incorporate all three features (multilevel data structure,
multiple-imputed data, and complex survey design) into consideration.

Your advice would be appreciated very much,

Rosie



----- Original Message ----
From: Carlo Lazzaro <[email protected]>
To: [email protected]
Cc: Rosie Chen <[email protected]>
Sent: Mon, March 1, 2010 1:58:39 AM
Subject: R: Missing data analysis



Dear Rosie,
I am not clear about what you mean with "we have to to delete cases that
have missing values", since this is not the standard practice.

If you mean (right)censored observations, they can be addressed in Stata via
Survival Analysis suite (please, see -stset- and related stuff in Stata
9.2/SE).

For more details on dealing with missing observations, especially when
they're variables rather than outcomes, you might want to take a look at:

Little RJA, Rubin DB. Statistical analysis with missing data. Second
Edition. Hoboken, NJ: Wiley, 2002.

HTH and Kind Regards,

Carlo 

-----Messaggio originale-----
Da: [email protected]
[mailto:[email protected]] Per conto di Rosie Chen
Inviato: domenica 28 febbraio 2010 21.31
A: [email protected]
Oggetto: st: Missing data analysis

Hi, dear listserv members,

   I have a question that is not specifically related to Stata, but would
like to have a try in here: 

   In most studies, we have to delete cases that have missing values on the
outcome variable. The issue is whether the deleted cases are significantly
different from the final sample we use, because of the potential sample
selection bias problem.  My question is: do we often compare the deleted
cases with the final raw sample without missing data imputation or with the
final sample with missing cases imputed? Any suggestions are appreciated
very much,

  Rosie



      
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


      
*
*   For searches and help try:
*  http://www.stata.com/help.cgi?search
*  http://www.stata.com/support/statalist/faq
*  http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*  http://www.stata.com/help.cgi?search
*  http://www.stata.com/support/statalist/faq
*  http://www.ats.ucla.edu/stat/stata/



      
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: R: Missing data analysis
  - From: "Carlo Lazzaro" <[email protected]>
- st: Missing data on outcome and sample selection bias
  - From: Rosie Chen <[email protected]>
- st: RE: Missing data on outcome and sample selection bias
  - From: "Lachenbruch, Peter" <[email protected]>
- Re: st: RE: Missing data on outcome and sample selection bias
  - From: Rosie Chen <[email protected]>

Prev by Date: Re: st: Inconsistent results with rocfit
Next by Date: Re: st: Competing Risk for repeated event nominal dependent variables
Previous by thread: Re: st: RE: Missing data on outcome and sample selection bias
Next by thread: st: Panel LM Unit Root Test with Heterogenous Structural Breaks
Index(es):
- Date
- Thread