|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Re: When to impute - and an alternative
.
lmlist is a function to run a linear model on each group defined by
the variable after the pipe "|". Several ways of doing this with Stata
and saving the results, e.g., statsby.
There is a similar function in nlme for nonlinear least squares,
nlslist, which does the same, and is useful to figure out which
parameters show variation that could be fitted by a random effect when
nlme is used.
The rest seems to be extracting the coefficients and standard errors
from the results. "Sumary" seems misspelled.
-Dave
On Dec 3, 2007, at 5:35 AM, Paul Seed wrote:
- --- David Airey <[email protected]> wrote:
> I have trouble understanding the translation of these three missing
> situations into when it is useful to impute.
The three situations are MCAR (missing completely at random)
MAR (missing at random) and NMAR (non-missing at random).
Analysis of complete data only can be biassed for MAR & NMAR.
Imputation is unnecessary with MCAR.
Here's a very practical approach :
1) Build a regression model to predict who is likely to go missing,
using the predictors you would use multiple imputation?
Is it reasonably powerful?
If not, there is no point in imputing. Your data is probably MCAR.
It is not MAR. Imputing will not help.
2) Calculate a prediction score from the logistic regression
Now, compare this score with the (non-missing) outcomes.
If there is no relationship, there is no correctable bias.
If you data passes tests 1) and 2), imputation is probably called for,
as MAR is a possibility. However, NMAR remains an issue.
3) Consider if there could be an unobserved process
causing people with extreme values of the outcome to go missing.
If you (and you non-statistical collaborators) judge this to be this
is implausible, your data is probably not NMAR.
Either way you should mention the possibility of NMAR
and the size & direction of any likely bias caused in the
discussion section of the paper.
A very interesting new paper on this subject is
Diggle, Fairwell & Henderson
Analysis of longitudinal data with dropout: objectives, assumptions
and a proposal.
Appl. Statis (2007) 56 (5) 499-550 (with discussion).
As the title implies, it contains a new method,
based on martingale assumptions and difference
scores. Their method is unbiassed under
MAR and under certain version of NMAR
(when the martingale assumptions are valid) &
is therefore superior to multiple imputation.
They claim the method is very easy to implement using
standard software, and they give 4 lines of S-PLUS:
fit <- lmList(PANSS~ tesat|time, data = schizophrenioa, pool=F)
apply(coef(fit),2, cumsum)
SEs <- sumary(fit)$coef[,"Std. Error",]
apply(SEs^2,2,cumcum)
If anyone is familiar with R or S PLUS, and in particular with the
lmList
command from the -nlme- package (Pinhero & Bates 2000,
"Mixed effect models in S and S PLUS", NY, Springer),
and could translate these 4 lines into Stata, they would be doing
a great favour to the Stata community.
==========================
Paul T Seed MSc CStat
Senior Lecturer in Medical Statistics
King's College London
Division of Reproduction and Endocrinology
St Thomas' Hospital,
Lambeth Palace Road,
London SE1 7EH
tel (+44) (0) 20 7188 3642
fax (+44) (0) 20 7620 1227
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
--
David C. Airey, Ph.D.
Pharmacology Research Assistant Professor
Center for Human Genetics Research Member
Department of Pharmacology
School of Medicine
Vanderbilt University
Rm 8158A Bldg MR3
465 21st Avenue South
Nashville, TN 37232-8548
TEL (615) 936-1510
FAX (615) 936-3747
EMAIL [email protected]
URL http://people.vanderbilt.edu/~david.c.airey/dca_cv.pdf
URL http://www.vanderbilt.edu/pharmacology
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/