Dear Statalist
I am going to speak my questions in following two parts:
1) Firstly, briefly introduce my problems;
2) The command I used and what I was informed by Stata (version 8.2)
1)There are some observations of one variable in my database are
missing, nearly 10%. I've got to say, the missing is kinda due to the
questionnaire and I doubt it is random missing. So the deterministic
methods is quite limited in this case.
I tried to use regression imputation by replacing missing values with
the predicted values from a regression of the missing item on the
variables related to the missing, most of which are categorical
variables.Though it works, I doubt the result. And I want to compare
it with the result of -hotdeck-, because the option -by- of -hotdeck-
can specify categorical variables defining strata within which the
imputation is to be carried out.
2) The following are the results:(limit_formal is the variable with missing)
hotdeck limit_formal, by(ifration iformal) store
The result is:
------------------------------------------------------
DELETING all matrices....
Table of the Missing data patterns
* signifies missing and - is not missing
Varlist order: limit_formal
pattern | Freq. Percent Cum.
------------+-----------------------------------
* | 49 14.54 14.54
- | 288 85.46 100.00
------------+-----------------------------------
Total | 337 100.00
333
WARNING: When the <command> option is not selected
then no analysis is performed on the imputed datasets
-------------------------------------------------------
Then I want to run a regression on other information, but it doesn't work.
. hotdeck limit_formal, by(ifration iformal) store command(reg
limit_formal ifration iformal lginc_total popu) impute(2)
parms(ifration iformal lginc_total popu)
Then I was told:
----------------------------------------------------------
DELETING all matrices....
Table of the Missing data patterns
* signifies missing and - is not missing
Varlist order: limit_formal
pattern | Freq. Percent Cum.
------------+-----------------------------------
* | 49 14.54 14.54
- | 288 85.46 100.00
------------+-----------------------------------
Total | 337 100.00
333
variable lginc_total not found
-------------------------------------------------------------
Actually, the variable lginc_total is no problem. Then I switch to
another way, it works.
. hotdeck limit_formal, by(ifration iformal) store command(logit
ifration limit_formal) impute(2) parms(limit_formal)
-------------------------------------------------------------
Table of the Missing data patterns
* signifies missing and - is not missing
Varlist order: limit_formal
pattern | Freq. Percent Cum.
------------+-----------------------------------
* | 49 14.54 14.54
- | 288 85.46 100.00
------------+-----------------------------------
Total | 337 100.00
333
WARNING: t less than 4 invalid global test increase
parameters OR imputations
Number of Obs. = 333
No. of Imputations = 2
% Lines of Missing Data = 13.513514 %
F( 4.000 ,1) = 2.7409
Prob > F = 0.1732
-------------------------------------------------------------------------------
Variable | Average Between Within Total df t p-value
| Coef. Imp. SE Imp. SE SE
---------+---------------------------------------------------------------------
limit_formal | -0.0000 0.000 0.000 0.000 8196554.1
-1.655 0.098
---------+---------------------------------------------------------------------
Variable | [95% Conf. Interval]
---------+---------------------------------------------------------------------
limit_formal | -0.0001 0.0000
-------------------------------------------------------------------------------
---------------------------------------------------------------------
Now, it works. But the dataset generated can not be merged into my
main database because there is no index or key variable that can
identify each observation. What's more, what's wrong with the
regression of -command-? Am I wrong? Or are there other alternatives I
can use to deal with the missing value?
Hereinabove are my problems. Frankly, I am "green hand" to missing
value. So any suggestion and comment are highly appreciated.
--
Xiangping JIA
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/