Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Missing Observations. Do I need multiple Imputations?
From
Abekah Nkrumah <[email protected]>
To
[email protected]
Subject
st: Missing Observations. Do I need multiple Imputations?
Date
Tue, 21 Aug 2012 17:18:16 +0100
Dear Statalist,
I will want some advice on this rather long question. Variable A in
the table below is a composite index derived from the aggregation
variables B, C, D, E and F which are also sub-indices. A geometric
aggregation method was used. From the table I realise that the
observations on the composite index (A) drops significantly
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
A 69623 .4898275 .1575975 .0498657 .8980919
B 187524 .524507 .2669241 1.80e-08 1
C 221089 .6625131 .3732415 2.18e-08 1
D 234680 .7486263 .3494941 -1.29e-08 1
E 108437 .5253285 .0648927 -2.61e-08 1
-------------+--------------------------------------------------------
F 119261 .6829314 .2270192 -1.62e-08 1
I then decided to do a missing data check for all the indices and the
results is below
Variable | Missing Total Percent Missing
----------------+-----------------------------------------------
A 166,075 235,698 70.46
B 48,174 235,698 20.44
C 14,609 235,698 6.20
D 1,018 235,698 0.43
E 127,261 235,698 53.99
F 116,437 235,698 49.40
----------------+-----------------------------------------------
I then checked the percentage missing for all the individual variables
used in computing the the sub-indices especially B, C, E and F. The
results is as below
Variable | Missing Total Percent Missing
----------------+-----------------------------------------------
B1 | 46,317 235,698 19.65
B2 | 46,967 235,698 19.93
B3 | 46,815 235,698 19.86
B4 | 47,005 235,698 19.94
C1 | 5,128 235,698 2.18
C2 | 5,164 235,698 2.19
C3 | 6,180 235,698 2.62
C4 | 9,730 235,698 4.13
C5 | 5,608 235,698 2.38
D1 | 444 235,698 0.19
D2 | 483 235,698 0.20
D3 | 657 235,698 0.28
E1 | 82,112 235,698 34.84
E2 | 58,504 235,698 24.82
E3 | 65,469 235,698 27.78
E4| 81,349 235,698 34.51
F1 | 214 235,698 0.09
F2 | 63,503 235,698 26.94
F3 | 86,512 235,698 36.70
F4 | 674 235,698 0.29
----------------+-----------------------------------------------
The results above suggest that the drop in the number of observations
for the composite empowerment variable is due to the high level of
missing values in the four sub-indices (B, C, E and F) as also
supported by the high level of missing values in the variables used in
computing those indices.
I was therefore wondering whether an explanation like this in the
appendix of my work will be fine or I will need to do multiple
imputing to replace the missing data.
I have thought through this and the question am asking myself is that
if have to do multiple imputation, the variables to for the imputation
exercise will be the B variables (these are decision-making
variables), then the E variables (these are number of wives, age at
first marriage, women's age, partners age) and then F3 and F4 (which
are partner's education and whether a woman earns cash).
My worry is whether it will be sensible to impute variables such as
age and number of wives? Secondly considering that I still have a
large sample size to work with, y guess is that the results from the
remaining sample will not change that much. Thus am wandering whether
it will still be necessary to impute the missing data
I will appreciate to hear from you on this so Will know which way to
go. Thank you very much.
Regards
Gordon
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/