Jennifer Wolfe Borum wrote:
> I am working with a data set composed of responses to survey questions
> which contains some categorical variables such as gender and ethnicity.
> The data has missing values and I have decided that it would be best to
> keep all observations due to a pattern in the missing values. I have
> decided to use the impute command in Stata to handle this as I've had some
> difficulty and am not familiar enough with the hotdeck and Amelia
> imputations. I've found that impute works fine for the continuous
> variables, however for the categorical variables I am obtaining values for
> which I am unsure how to interpret. For example, I will get an imputed
> value of .35621 for gender which is coded 1 or 0. Would anyone be able to
> help with the interpretation of the values I am obtaining for the
> categorical data?
I have never used -impute- before, but the values you appear to have
generated with it for your gender variable are what one might call
'pseudo-probabilities'. A value of .35621 would suggest that the
probability of the observation being of gender==1 is low.
> Also, I would be interested in knowing which approach other Stata users
> prefer for imputing values as this is the first time I have encountered
> missing values and I am just beginning to research the various methods of
> imputation.
I'm not sure how imputing missing values for a variable like gender is
particularly useful. If you don't know the gender for an observation, I
personally think it best to leave it as missing, rather than guess
(sometimes correctly, sometimes erroneously), as you risk undermining the
accuracy of your data.
CLIVE NICHOLAS |t: 0(44)191 222 5969
Politics |e: [email protected]
Newcastle University |http://www.ncl.ac.uk/geps
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/