Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: mi of time variant measures in longitudinal data using mvn and chained approach
From
"Jacqueline Jodl" <[email protected]>
To
<[email protected]>
Subject
st: mi of time variant measures in longitudinal data using mvn and chained approach
Date
Mon, 29 Jul 2013 09:56:21 -0400
Dear Statalist,
I am struggling so with imputing my longitudinal dataset.
My fundamental problem is not time INVARIANT measures; it is time variant
measures.
A great example of a successfully imputed time INVARIANT measure are the
cognitive test score in middle school, PIATmath and PIATread. I have 4188
respondents; I am missing between 500 and 600 for each score. Mvn approach
imputed them with no issues.
. mi impute mvn PIATmath PIATread = TANF highgrade highgrademom race male
mombirthage birthorder,
> add(5)
Performing EM optimization:
note: 496 observations omitted from EM estimation because of all imputation
variables missing
observed log likelihood = -21249.41 at iteration 5
Performing MCMC data augmentation ...
Multivariate imputation Imputations = 5
Multivariate normal regression added = 5
Imputed: m=1 through m=5 updated = 0
Prior: uniform Iterations = 500
burn-in = 100
between = 100
------------------------------------------------------------------
| Observations per m
|----------------------------------------------
Variable | Complete Incomplete Imputed | Total
-------------------+-----------------------------------+----------
PIATmath | 3674 514 514 | 4188
PIATread | 3618 570 570 | 4188
------------------------------------------------------------------
(complete + incomplete = total; imputed is the minimum across m
of the number of filled-in observations.)
.
end of do-file
A great example of a time invariant measure that I have yet to successfully
impute is the TIPI score (personality measure) for extraversion. My data is
in wide form. (As stated above, I have a total of 4188 respondents.)
This measure was included in three waves of my nine waves of data, 2006,
2008 and 2010. The question is ONLY asked of those respondents who are over
age 19.
In 2006 I have 3646 observations with no missing data (the difference
between 4188 and 3646 represents those were valid skips). In 2008 the
question was asked again but only of those who were missed in 2006; in 2008
there are 194 observations with 7 missing data observations (none of these
respondents overlap with 2006). In 2010 I have 3026 observations (for most
of these respondents this is their second observation) with 44 missing data
observations.
I checked tabs/codebook before and after recoding for soft and hard missing
data to make sure the recoding syntax worked.
Again, my data is in wide format. I used both mvn and chained approach.
THIS IS THE ERROR I RECEIVED FOR mvn:
. mi impute mvn tipiextra_2008 tipicrit_2008 tipiselfdis_2008
tipianx_2008 tipiopen_2008 tipi
> res_2008 tipisym_2008 tipidisorgan_2008 tipicalm_2008 tipiconv_2008
///
> tipiextra_2010 tipicrit_2010 tipiselfdis_2010 tipianx_2010
tipiopen_2010 tipires_2010 tipi
> sym_2010 tipidisorgan_2010 tipicalm_2010 tipiconv_2010 ///
> = tipiextra_2006 tipicrit_2006 tipiselfdis_2006 tipianx_2006
tipiopen_2006 tipires_2006 ti
> pisym_2006 tipidisorgan_2006 tipicalm_2006 tipiconv_2006 TANF highgrade
highgrademom race mal
> e mombirthage birthorder, add(5)
note: variables tipianx_2010 tipires_2010 tipicalm_2010 contain no soft
missing (.) values;
imputing nothing
no observations
stata(): 3598 Stata returned error
_Mis_Est::init(): - function returned error
_DA_Norm::init(): - function returned error
<istmt>: - function returned error
r(3598);
end of do-file
THIS IS THE ERROR I RECEIVED WITH CHAINED:
Performing chained iterations ...
tipianx_2010: missing imputed values produced
This may occur when imputation variables are used as independent
variables or when
independent variables contain missing values. You can specify option
force if you wish to
proceed anyway.
r(498);
end of do-file
I CHECKED AND RECHECKED TO MAKE SURE NONE OF MY INDEPENDENT VARIABLES
CONTAIN MISSING VALUES.
I THINK THE PROBLEM IS THAT MI WANTS TO IMPUTE TO 4188 OBSERVATIONS FOR TIME
VARIANT OBSERVATIONS BECAUSE MY SAMPLE SIZE CONTAINS 4188 RESPONDENTS. SO
4188 IS THE NUMBER TO USE FOR TIME INVARIANT OBSERVATIONS, BUT FOR TIME
VARIANT OBSERVATIONS, IT VARYS BY WAVE, BY MEASURE.
Any advice would be greatly appreciated.
A completely despondent doctoral student,
Jackie Jodl
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/