Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Missing Observations. Do I need multiple Imputations?
From
A Loumiotis <[email protected]>
To
[email protected]
Subject
Re: st: Missing Observations. Do I need multiple Imputations?
Date
Wed, 22 Aug 2012 09:44:18 +0300
Hi Gordon,
Since your aggregate variable is missing when at least one component
is missing I believe you would first need to multiple impute the
missing observations of your dataset and then compute your aggregate
variable. I don't see a problem with multiple imputing variables such
as age or number of wifes. In addition, your results might change if
your data are missing (conditionally) at random even if your non
missing sample is large.
Best,
Antonis
On Tue, Aug 21, 2012 at 7:18 PM, Abekah Nkrumah <[email protected]> wrote:
> Dear Statalist,
>
>
> I will want some advice on this rather long question. Variable A in
> the table below is a composite index derived from the aggregation
> variables B, C, D, E and F which are also sub-indices. A geometric
> aggregation method was used. From the table I realise that the
> observations on the composite index (A) drops significantly
>
>
> Variable | Obs Mean Std. Dev. Min Max
> -------------+--------------------------------------------------------
> A 69623 .4898275 .1575975 .0498657 .8980919
> B 187524 .524507 .2669241 1.80e-08 1
> C 221089 .6625131 .3732415 2.18e-08 1
> D 234680 .7486263 .3494941 -1.29e-08 1
> E 108437 .5253285 .0648927 -2.61e-08 1
> -------------+--------------------------------------------------------
> F 119261 .6829314 .2270192 -1.62e-08 1
>
>
> I then decided to do a missing data check for all the indices and the
> results is below
>
> Variable | Missing Total Percent Missing
> ----------------+-----------------------------------------------
> A 166,075 235,698 70.46
> B 48,174 235,698 20.44
> C 14,609 235,698 6.20
> D 1,018 235,698 0.43
> E 127,261 235,698 53.99
> F 116,437 235,698 49.40
> ----------------+-----------------------------------------------
>
>
> I then checked the percentage missing for all the individual variables
> used in computing the the sub-indices especially B, C, E and F. The
> results is as below
>
>
> Variable | Missing Total Percent Missing
> ----------------+-----------------------------------------------
> B1 | 46,317 235,698 19.65
> B2 | 46,967 235,698 19.93
> B3 | 46,815 235,698 19.86
> B4 | 47,005 235,698 19.94
> C1 | 5,128 235,698 2.18
> C2 | 5,164 235,698 2.19
> C3 | 6,180 235,698 2.62
> C4 | 9,730 235,698 4.13
> C5 | 5,608 235,698 2.38
> D1 | 444 235,698 0.19
> D2 | 483 235,698 0.20
> D3 | 657 235,698 0.28
> E1 | 82,112 235,698 34.84
> E2 | 58,504 235,698 24.82
> E3 | 65,469 235,698 27.78
> E4| 81,349 235,698 34.51
> F1 | 214 235,698 0.09
> F2 | 63,503 235,698 26.94
> F3 | 86,512 235,698 36.70
> F4 | 674 235,698 0.29
> ----------------+-----------------------------------------------
>
> The results above suggest that the drop in the number of observations
> for the composite empowerment variable is due to the high level of
> missing values in the four sub-indices (B, C, E and F) as also
> supported by the high level of missing values in the variables used in
> computing those indices.
>
> I was therefore wondering whether an explanation like this in the
> appendix of my work will be fine or I will need to do multiple
> imputing to replace the missing data.
>
> I have thought through this and the question am asking myself is that
> if have to do multiple imputation, the variables to for the imputation
> exercise will be the B variables (these are decision-making
> variables), then the E variables (these are number of wives, age at
> first marriage, women's age, partners age) and then F3 and F4 (which
> are partner's education and whether a woman earns cash).
>
> My worry is whether it will be sensible to impute variables such as
> age and number of wives? Secondly considering that I still have a
> large sample size to work with, y guess is that the results from the
> remaining sample will not change that much. Thus am wandering whether
> it will still be necessary to impute the missing data
>
> I will appreciate to hear from you on this so Will know which way to
> go. Thank you very much.
>
> Regards
>
> Gordon
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/