cases we pretend that the data are MAR, even though we know that is
unlikely. But, I think we have to realize that this is a RESEARCH question,
not a statistical one. After all, our statistical models typically make all
kinds of assumptions (random selection in many cases, normality, etc.), some
of which are testable and some of which are not. Even when testable, the
tests are often inadequate.
Our stayistical models are just guides. The data analyses based upon these
models are always flawed. The real question is not "what is the correct
model?" But, rather, "is this model close enough to the real world phenomena
of interest that I can reasonable trust the results?"
The latter question is always one that goes beyond statistics. We apply
various supplementary analyses, statistical "tests," knowledge about the
likely nature of the data from other experiences, and other knowledge about
the real-world phenomena to make such judgments.
I believe that we are always doing this. However, we are often not very
consciously aware of it. Our language and our publishing traditions militate
against it. We talk of "correct models", as if such ever exist. Philosopher
Nancy Cartright (How the Laws of Physics Lie) made a strong argument that
correct models do not even exist in physics. Jan de Leeuw made related
arguments in data analysis.
Our publications also encourage talk of such things, despite our awareness
that they are, at best, approximations. How many have assumed data is MAR,
knowing full well that that was unlikely, but also knowing that publication
depends upon that fiction?
These issuess are why data analysis is always a craft. We implicitly look
for signs that the analyst/researcher is savvy enough to have performed many
checks along the way. For example, many sophisticated analysts using such
techniques as Structural Equation Modelling or Multilevel Modeling, will
usually examine simpler models that are known to be "incorrect," as a check
upon the allegedly more "correct", but more complex model. Any major
discrepancies are then examined. We then erase the evidence of our doubts
from the finished product.
So, all of the techniques proposed -- dummy variables for missing (despite
known problems), assuming MAR, and using external data -- are potential ways
of asssessing the nature of the phenemonon. After all, we are more
interested in knowing about the world than bout our models per se.
Stephen Soldz
Director, Center for Research, Evaluation, and Program Development
Boston Graduate School of Psychoanalysis
1581 Beacon St.
Brookline, MA 02446
[email protected]
Date: Sat, 23 May 2009 08:02:55 +0000 (GMT)
From: Maarten buis <[email protected]>
Subject: RE: st: Missing outcome variables - how to deal with these?
- --- On Fri, 22/5/09, Tomas M wrote:
> For my data, I am quite certain that the data is not missing at random
> (NMAR). I have reason to believe that my missing outcome data is
> related to the outcome data itself. I do have a full set of
> explanatory variables for all of my observations, however.
>
> Does this mean that I cannot use the typical remedies? What other
> options are there for analyzing missing data that is non-ignorable?
I have always stayed away from those NMAR models. The problem is that they
just can't produce empirical estimates: They critically depend on something
that can't be seen. I realise that there are questions out there that are so
important that we must just give the best "guesstimate" we can, even though
under normal circumstance that best guess would not be considered good
enough. Till now I have been able to avoid those questions, so I don't know
the answer to your question.
- -- Maarten
- -----------------------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany
http://home.fsw.vu.nl/m.buis/
- -----------------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
------------------------------
Date: Sat, 23 May 2009 03:10:58 -0700 (PDT)
From: Ana Gabriela Guerrero Serdan <[email protected]>
Subject: RE: st: Missing outcome variables - how to deal with these?
Dont know if this applies to your type of data but if you have survey data
you can first see how much selection for those individuals where you have
missing information. How different they are from the rest, compare the rest
of the characteristics where you do have some information. So do a test to
check this.
One thing you would be able do if you have missing values for some of the
explanatory variables (which is not your case) is to create a dummy =1 for
those variables that you have missing values, in this way you dont loose the
observations when you do your analysis, for example, if you do a regression
and your outcome variable is education and you want to include an
explanatory variable of education of the mother/father but you have missing
values here, then you include the dummy that I was mentining before.
hope it helps,
regards,
Gaby
- --- On Sat, 5/23/09, Maarten buis <[email protected]> wrote:
> From: Maarten buis <[email protected]>
> Subject: RE: st: Missing outcome variables - how to deal with these?
> To: [email protected]
> Date: Saturday, May 23, 2009, 3:02 AM
>
> --- On Fri, 22/5/09, Tomas M wrote:
> > For my data, I am quite certain that the data is not missing at
> > random (NMAR). I have reason to
> believe
> > that my missing outcome data is related to the outcome
> data
> > itself. I do have a full set of explanatory
> variables
> > for all of my observations, however.
> >
> > Does this mean that I cannot use the typical remedies? What other
> > options are there for
> analyzing
> > missing data that is non-ignorable?
>
> I have always stayed away from those NMAR models. The problem is that
> they just can't produce empirical estimates: They critically depend on
> something that can't be seen. I realise that there are questions out
> there that are so important that we must just give the best
> "guesstimate" we can, even though under normal circumstance that best
> guess would not be considered good enough. Till now I have been able
> to avoid those questions, so I don't know the answer to your question.
>
> -- Maarten
>
> -----------------------------------------
> Maarten L. Buis
> Institut fuer Soziologie
> Universitaet Tuebingen
> Wilhelmstrasse 36
> 72074 Tuebingen
> Germany
>
> http://home.fsw.vu.nl/m.buis/
> -----------------------------------------
------------------------------
Date: Sat, 23 May 2009 12:36:53 +0200
From: "Carlo Lazzaro" <[email protected]>
Subject: R: st: Missing outcome variables - how to deal with these?
Dear Tomas,
just echoing Maarten's wise and at the same time discouraging remarks:
- - even the reference textbook about this topic (Little RJA, Rubin DB.
Statistical analysis with missing data. 2nd edition. Chichester: Wiley:
2002) allots few pages to NMAR mechanism (from Subject Index: 12, 13-15,
18-19.
- - Hence, the only possible way to deal with NMAR is to rely upon external
data sources for similar items (please, see: Ramsey S, Wilke R, Briggs A, et
al. Best practices for economic evaluations alongside clinical trials: an
ISPOR RCT_CEA Task Force report. Value Health 2005; 8: 521-33).
However, the problem seems to go out from the door and come back through the
window: it's again a matter of how good are those external sources for your
research needs. A possible further advice is to perform some sensitivity
analysis after filling in NMAR data and see what happens when you change
MNAR guess estimates within a reasonable or customarily range relative to
your research field.
Kind Regards and enjoy your W-E,
Carlo
- -----Messaggio originale-----
Da: [email protected]
[mailto:[email protected]] Per conto di Maarten buis
Inviato: sabato 23 maggio 2009 10.03
A: [email protected]
Oggetto: RE: st: Missing outcome variables - how to deal with these?
- --- On Fri, 22/5/09, Tomas M wrote:
> For my data, I am quite certain that the data is not missing at random
> (NMAR). I have reason to believe that my missing outcome data is
> related to the outcome data itself. I do have a full set of
> explanatory variables for all of my observations, however.
>
> Does this mean that I cannot use the typical remedies? What other
> options are there for analyzing missing data that is non-ignorable?
I have always stayed away from those NMAR models. The problem is that they
just can't produce empirical estimates: They critically depend on something
that can't be seen. I realise that there are questions out there that are so
important that we must just give the best "guesstimate" we can, even though
under normal circumstance that best guess would not be considered good
enough. Till now I have been able to avoid those questions, so I don't know
the answer to your question.
- -- Maarten
- -----------------------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany
http://home.fsw.vu.nl/m.buis/
- -----------------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/