Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Missing Observations. Do I need multiple Imputations?
From
A Loumiotis <[email protected]>
To
[email protected]
Subject
Re: st: Missing Observations. Do I need multiple Imputations?
Date
Wed, 22 Aug 2012 12:08:03 +0300
I agree with you and I think that's what I also said. Your composite
variable is missing if at least one of it component variables (B C D E
F) is missing. When none of the component variables are missing then
your composite variable is not missing.
On Wed, Aug 22, 2012 at 10:32 AM, Abekah Nkrumah <[email protected]> wrote:
> Dear Antonis,
>
> Thank you very much for your reply. I want to understand your first
> line were you saying my aggregate variable is missing entirely? In my
> statement I said the composite index (A) which you refereed to as
> aggregate variable is there but drops substantial amount of
> observations. So it is not entirely missing
>
> Thanks very much
>
> Regards
>
> On Wed, Aug 22, 2012 at 7:44 AM, A Loumiotis
> <[email protected]> wrote:
>> Hi Gordon,
>>
>> Since your aggregate variable is missing when at least one component
>> is missing I believe you would first need to multiple impute the
>> missing observations of your dataset and then compute your aggregate
>> variable. I don't see a problem with multiple imputing variables such
>> as age or number of wifes. In addition, your results might change if
>> your data are missing (conditionally) at random even if your non
>> missing sample is large.
>>
>> Best,
>> Antonis
>>
>>
>>
>> On Tue, Aug 21, 2012 at 7:18 PM, Abekah Nkrumah <[email protected]> wrote:
>>> Dear Statalist,
>>>
>>>
>>> I will want some advice on this rather long question. Variable A in
>>> the table below is a composite index derived from the aggregation
>>> variables B, C, D, E and F which are also sub-indices. A geometric
>>> aggregation method was used. From the table I realise that the
>>> observations on the composite index (A) drops significantly
>>>
>>>
>>> Variable | Obs Mean Std. Dev. Min Max
>>> -------------+--------------------------------------------------------
>>> A 69623 .4898275 .1575975 .0498657 .8980919
>>> B 187524 .524507 .2669241 1.80e-08 1
>>> C 221089 .6625131 .3732415 2.18e-08 1
>>> D 234680 .7486263 .3494941 -1.29e-08 1
>>> E 108437 .5253285 .0648927 -2.61e-08 1
>>> -------------+--------------------------------------------------------
>>> F 119261 .6829314 .2270192 -1.62e-08 1
>>>
>>>
>>> I then decided to do a missing data check for all the indices and the
>>> results is below
>>>
>>> Variable | Missing Total Percent Missing
>>> ----------------+-----------------------------------------------
>>> A 166,075 235,698 70.46
>>> B 48,174 235,698 20.44
>>> C 14,609 235,698 6.20
>>> D 1,018 235,698 0.43
>>> E 127,261 235,698 53.99
>>> F 116,437 235,698 49.40
>>> ----------------+-----------------------------------------------
>>>
>>>
>>> I then checked the percentage missing for all the individual variables
>>> used in computing the the sub-indices especially B, C, E and F. The
>>> results is as below
>>>
>>>
>>> Variable | Missing Total Percent Missing
>>> ----------------+-----------------------------------------------
>>> B1 | 46,317 235,698 19.65
>>> B2 | 46,967 235,698 19.93
>>> B3 | 46,815 235,698 19.86
>>> B4 | 47,005 235,698 19.94
>>> C1 | 5,128 235,698 2.18
>>> C2 | 5,164 235,698 2.19
>>> C3 | 6,180 235,698 2.62
>>> C4 | 9,730 235,698 4.13
>>> C5 | 5,608 235,698 2.38
>>> D1 | 444 235,698 0.19
>>> D2 | 483 235,698 0.20
>>> D3 | 657 235,698 0.28
>>> E1 | 82,112 235,698 34.84
>>> E2 | 58,504 235,698 24.82
>>> E3 | 65,469 235,698 27.78
>>> E4| 81,349 235,698 34.51
>>> F1 | 214 235,698 0.09
>>> F2 | 63,503 235,698 26.94
>>> F3 | 86,512 235,698 36.70
>>> F4 | 674 235,698 0.29
>>> ----------------+-----------------------------------------------
>>>
>>> The results above suggest that the drop in the number of observations
>>> for the composite empowerment variable is due to the high level of
>>> missing values in the four sub-indices (B, C, E and F) as also
>>> supported by the high level of missing values in the variables used in
>>> computing those indices.
>>>
>>> I was therefore wondering whether an explanation like this in the
>>> appendix of my work will be fine or I will need to do multiple
>>> imputing to replace the missing data.
>>>
>>> I have thought through this and the question am asking myself is that
>>> if have to do multiple imputation, the variables to for the imputation
>>> exercise will be the B variables (these are decision-making
>>> variables), then the E variables (these are number of wives, age at
>>> first marriage, women's age, partners age) and then F3 and F4 (which
>>> are partner's education and whether a woman earns cash).
>>>
>>> My worry is whether it will be sensible to impute variables such as
>>> age and number of wives? Secondly considering that I still have a
>>> large sample size to work with, y guess is that the results from the
>>> remaining sample will not change that much. Thus am wandering whether
>>> it will still be necessary to impute the missing data
>>>
>>> I will appreciate to hear from you on this so Will know which way to
>>> go. Thank you very much.
>>>
>>> Regards
>>>
>>> Gordon
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/statalist/faq
>>> * http://www.ats.ucla.edu/stat/stata/
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/statalist/faq
>> * http://www.ats.ucla.edu/stat/stata/
>
>
>
> --
> **********************************************
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/