| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: RE: RE: RE: Fillin missing values, probability for new value
From |
"Maarten Buis" <[email protected]> |
To |
<[email protected]> |
Subject |
st: RE: RE: RE: Fillin missing values, probability for new value |
Date |
Thu, 19 Jan 2006 12:42:43 +0100 |
Pamela Mueller wrote:
> the data was not given if the number of
> startups (st) is less than 3 (hence one or two).
> Therefore, I know if the missing is 1 or 2 and
> I know the probability for either one.
Maarten Buis wrote:
> You could do:
> replace st = 1 if runif()<prob1 & st==.
> replace st = 2 if st==.
>
> This will ensure that you get each code with the right probability.
>
> Even better would be to make multiple completed datasets this way.
Maarten Buis wrote:
> for each observation with a missing you add two
> observations: one with value 1 and one with value 2 and
> you attach to the first observation a weight equal to prob1
> and to the second a weight equal to prob2. All complete
> observations receive a weight of 1. Again, it sounds nice to
> me, but if anyone else on statalist warns you not to use it,
> than I will bow to superior wisdom.
Below I have put example do file that implements all my suggestions.
I recommend that you use the multiple imputation method. If you
use the single imputation method you assume that you are as sure
about your imputed values as you are about your observed values.
Consequently, your standard errors will be too small. Multiple
Imputation will correct for that. This example requires you to
install the "Tools for analyzing multiple imputed datasets" by
John B. Carlin, Ning Li, Philip Greenwood, and Carolyn Coffey.
To do so type - net install st0042-.
I have also added an example on how to implement the "weighted
solution" to your problem. It seems to work, but I am curious to
hear the views of other members of the statalist on this method.
HTH,
Maarten
*--------------------------begin example-----------------------------
set seed 1234
set more off
cd c:\temp\
sysuse auto, clear
tab rep78
reg rep78 foreign mpg
/*compute probability that "future missing value" is 2*/
tab rep78 if rep78==2 | rep78==3
/*Single Imputation*/
recode rep78 2=. 3=.
replace rep78 = 2 if uniform() < .2105 & rep78==.
replace rep78 = 3 if rep78==.
reg rep78 foreign mpg
/*Multiple Imputation*/
sysuse auto, clear
recode rep78 2=. 3=.
forvalues i=1/9 {
/*impute missing values and save imputed file*/
replace rep78 = 2 if uniform() < .2105 & rep78==.
replace rep78 = 3 if rep78==.
save mi`i'.dta, replace
/*make imputed values missing again for next itteration*/
recode rep78 2=. 3=.
}
drop _all
miset using mi
mifit: regress rep78 foreign mpg
/*weighting*/
sysuse auto, clear
recode rep78 2=. 3=.
gen id=_n
expand = 2 if rep78==. /*duplicate observations with missing value*/
sort id
gen w = 1 /*generate weights*/
by id: replace rep78=2 if _n==1 & _N==2
by id: replace w=.2105 if _n==1 & _N==2
by id: replace rep78=3 if _n==2
by id: replace w=.7895 if _n==2
regress rep78 foreign mpg [iw=w]
*--------------end example-----------------------
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z214
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/