Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Missing Observations. Do I need multiple Imputations?
From
Abekah Nkrumah <[email protected]>
To
[email protected]
Subject
Re: st: Missing Observations. Do I need multiple Imputations?
Date
Wed, 22 Aug 2012 08:32:38 +0100
Dear Antonis,
Thank you very much for your reply. I want to understand your first
line were you saying my aggregate variable is missing entirely? In my
statement I said the composite index (A) which you refereed to as
aggregate variable is there but drops substantial amount of
observations. So it is not entirely missing
Thanks very much
Regards
On Wed, Aug 22, 2012 at 7:44 AM, A Loumiotis
<[email protected]> wrote:
> Hi Gordon,
>
> Since your aggregate variable is missing when at least one component
> is missing I believe you would first need to multiple impute the
> missing observations of your dataset and then compute your aggregate
> variable. I don't see a problem with multiple imputing variables such
> as age or number of wifes. In addition, your results might change if
> your data are missing (conditionally) at random even if your non
> missing sample is large.
>
> Best,
> Antonis
>
>
>
> On Tue, Aug 21, 2012 at 7:18 PM, Abekah Nkrumah <[email protected]> wrote:
>> Dear Statalist,
>>
>>
>> I will want some advice on this rather long question. Variable A in
>> the table below is a composite index derived from the aggregation
>> variables B, C, D, E and F which are also sub-indices. A geometric
>> aggregation method was used. From the table I realise that the
>> observations on the composite index (A) drops significantly
>>
>>
>> Variable | Obs Mean Std. Dev. Min Max
>> -------------+--------------------------------------------------------
>> A 69623 .4898275 .1575975 .0498657 .8980919
>> B 187524 .524507 .2669241 1.80e-08 1
>> C 221089 .6625131 .3732415 2.18e-08 1
>> D 234680 .7486263 .3494941 -1.29e-08 1
>> E 108437 .5253285 .0648927 -2.61e-08 1
>> -------------+--------------------------------------------------------
>> F 119261 .6829314 .2270192 -1.62e-08 1
>>
>>
>> I then decided to do a missing data check for all the indices and the
>> results is below
>>
>> Variable | Missing Total Percent Missing
>> ----------------+-----------------------------------------------
>> A 166,075 235,698 70.46
>> B 48,174 235,698 20.44
>> C 14,609 235,698 6.20
>> D 1,018 235,698 0.43
>> E 127,261 235,698 53.99
>> F 116,437 235,698 49.40
>> ----------------+-----------------------------------------------
>>
>>
>> I then checked the percentage missing for all the individual variables
>> used in computing the the sub-indices especially B, C, E and F. The
>> results is as below
>>
>>
>> Variable | Missing Total Percent Missing
>> ----------------+-----------------------------------------------
>> B1 | 46,317 235,698 19.65
>> B2 | 46,967 235,698 19.93
>> B3 | 46,815 235,698 19.86
>> B4 | 47,005 235,698 19.94
>> C1 | 5,128 235,698 2.18
>> C2 | 5,164 235,698 2.19
>> C3 | 6,180 235,698 2.62
>> C4 | 9,730 235,698 4.13
>> C5 | 5,608 235,698 2.38
>> D1 | 444 235,698 0.19
>> D2 | 483 235,698 0.20
>> D3 | 657 235,698 0.28
>> E1 | 82,112 235,698 34.84
>> E2 | 58,504 235,698 24.82
>> E3 | 65,469 235,698 27.78
>> E4| 81,349 235,698 34.51
>> F1 | 214 235,698 0.09
>> F2 | 63,503 235,698 26.94
>> F3 | 86,512 235,698 36.70
>> F4 | 674 235,698 0.29
>> ----------------+-----------------------------------------------
>>
>> The results above suggest that the drop in the number of observations
>> for the composite empowerment variable is due to the high level of
>> missing values in the four sub-indices (B, C, E and F) as also
>> supported by the high level of missing values in the variables used in
>> computing those indices.
>>
>> I was therefore wondering whether an explanation like this in the
>> appendix of my work will be fine or I will need to do multiple
>> imputing to replace the missing data.
>>
>> I have thought through this and the question am asking myself is that
>> if have to do multiple imputation, the variables to for the imputation
>> exercise will be the B variables (these are decision-making
>> variables), then the E variables (these are number of wives, age at
>> first marriage, women's age, partners age) and then F3 and F4 (which
>> are partner's education and whether a woman earns cash).
>>
>> My worry is whether it will be sensible to impute variables such as
>> age and number of wives? Secondly considering that I still have a
>> large sample size to work with, y guess is that the results from the
>> remaining sample will not change that much. Thus am wandering whether
>> it will still be necessary to impute the missing data
>>
>> I will appreciate to hear from you on this so Will know which way to
>> go. Thank you very much.
>>
>> Regards
>>
>> Gordon
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
--
**********************************************
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/