Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: Re: Missing values test


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: RE: Re: Missing values test
Date   Sun, 2 Dec 2007 17:48:16 -0000

Indeed. 

I don't find the idea of variables you don't have and that have no 
connections with any variables you do have that compelling or congenial 
scientifically, but I bow to any superior wisdom here. 

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Maarten buis
Sent: 02 December 2007 17:36
To: [email protected]
Subject: Re: st: RE: Re: Missing values test

--- Nick Cox <[email protected]> wrote:
> Missingness can always be represented by a dummy. So the structure of 
> missing data can always be explored by logit regression with 
> missingness on something as response w.r.t. various predictors, which 
> may well include missingness on some other things as dummy predictors.

The problem here is that now you are talking about what is known in the
missing data literature as the Missing Completely At Random (MCAR)
assumption. Often three types of missing data are distinguished in this
literature: Missing Completely At Random (MCAR), Missing At Random
(MAR), and Not Missing At Random (NMAR). Multiple Imputation is based on
the MAR assumption.

MCAR assumes that every individual has the probability of getting a
missing value, i.e. the probability of missingness is not influenced by
any variable. This assumption can be investigated for the observed data,
in a way suggested by Nick. If you have MCAR or if you can show that the
probability of missingness does not depend on your dependent variable,
than the safe thing to do is just use the observed cases, as those will
give unbiased estimates with correct inference.

MAR assumes that the probability of missingness may differ from person
to person, but these differences are only caused by observed variables.
In order to show that the MAR holds you need to show that the unobserved
values of the missing variables do not influence the probability of
missingess, which is self-defeating: if you had those unobserved values
those values wouldn't be missing. So this assumption is fundamentally
untestable.

NMAR assumes that the probability of missingness is influenced by both
observed and unobserved information. For instance say that persons with
a very high or very low income are less inclined to reveal their income
in a questionair.  

-- Maarten

-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------


      __________________________________________________________
Sent from Yahoo! - the World's favourite mail http://uk.mail.yahoo.com

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index