Thanks, both Svend and Rich,
I read the Demographic and Health survey website further, and the
"missing values" are truly missing - due to interviewer errors. So,
instead of imputing them, I decided on a different strategy.
Conceptually, what is important for HIV risk is not necessarily the
issue of unprotected sex, but unprotected sex with partners other than
one's primary partner. So, I created a different variable that
measures the extent to which people engage in unprotected sex (ie
without a condom) with a non-spousal or non-cohabiting partners (1),
or otherwise (0), and got the following distribution;
RiskySex
-----------------------------------------------------------
| Freq. Percent Valid Cum.
--------------+--------------------------------------------
Valid 0 | 9377 87.59 87.59 87.59
1 | 1329 12.41 12.41 100.00
Total | 10706 100.00 100.00
-----------------------------------------------------------
Though the proportion at risk is small - 12% (and probably
understimated due to perceived social aversion to such self-reports),
it avoids the problem with missing values.
Cheers, Cy
------------------------------------------------------------------
On Tue, Jul 14, 2009 at 7:51 AM, Richard
Goldstein<[email protected]> wrote:
> It is not clear what Svend thinks is going on here, but for anyone thinking
> of using this strategy, I recommend reading Jones, MP (1996), "Indicator and
> Stratificatio Methods for missing explanatory variables in multiple linear
> regression," _Journal of the American Statistical Association_, 91: 222-230
>
> Rich
>
> Svend Juul wrote:
>>
>> Cy wrote:
>> In a previous post, I indicated there was a drastic reduction in my
>> sub-population size. I traced the problem to a variable with a lot of
>> missing cases.
>> As you can see from the table below, this variable elicits whether the
>> respondent engaged in unprotected sexual intercourse. About a third of
>> the cases (33.78%) are missing.
>> V761 -- Last intercourse used condom
>> -----------------------------------------------------------
>> | Freq. Percent Valid Cum.
>> ---------------+--------------------------------------------
>> Valid 0 No | 6012 56.16 84.81 84.81
>> 1 Yes | 1075 10.04 15.16 99.97
>> 9 | 2 0.02 0.03 100.00
>> Total | 7089 66.22 100.00
>> Missing . | 3617 33.78
>> Total | 10706 100.00
>> -----------------------------------------------------------
>> Since the dependent variable in my deals with HIV risk, I need to
>> include sexual risk variables such as the V761 in the model. How do I
>> deal with this missing data problem, so that it does not affect my
>> sample size. Would an imputation work?
>> ==========================================================
>> In this case, I would avoid imputation and instead generate two dummy
>> variables:
>> V761_0 = 1 if no condom use, otherwise 0
>> V761_miss = 1 if missing or 9, otherwise 0
>> . generate V761_0 = V761==0
>> . generate V761_miss = V761>1
>> . groups V761* , missing
>> +--------------------------------------------+
>> | V761 V761_0 V761_m~s Freq. Percent |
>> |--------------------------------------------|
>> | 0 1 0 6012 56.16 |
>> | 1 0 0 1075 10.04 |
>> | 9 0 1 2 0.02 |
>> | . 0 1 3617 33.78 |
>> +--------------------------------------------+
>> -groups- is an unofficial command (ssc install groups).
>> Both variables should be included in your regression. You will still
>> have a problem interpreting what missing means, but that problem
>> can not be solved by imputation.
>> Hope this helps
>> Svend
>> ________________________________________________________ Svend Juul
>> Institut for Folkesundhed, Afdeling for Epidemiologi
>> (School of Public Health, Department of Epidemiology)
>> Bartholins Allé 2
>> DK-8000 Aarhus C, Denmark Phone: +45 8693 7796 Mobile: +45 2634 7796
>> E-mail: [email protected]
>> _________________________________________________________
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/