|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: mean, mode or median for missing values
--- On Wed, 3/2/10, Richard Williams wrote:
> If you are going to use it as an
> independent variable, do you plan on treating it as
> continuous? If so, you wouldn't be the first person in
> the world to do so and plugging in a mean or an imputed
> regression estimate becomes more reasonable.
I'll have to disagree with that. Imputing the dependent
variables with mean values will seriously distort the
associations you are going to find, as can be seen in
the example below:
*---------- begin example ------------
// create some data
drop _all
set obs 200
gen x = rnormal()
gen y = x + .5*rnormal()
// create missing values
gen y2 = y in 20/l
// impute with mean values
sum y2
replace y2 = r(mean) if y2==.
// display the concequences
scatter y2 x in 20/l || ///
scatter y2 x in 1/19, ///
legend(order(1 "observed"
2 "imputed"))
*------------ end example ------------
( For more on how to use examples I sent to statalist see:
http://www.maartenbuis.nl/stata/exampleFAQ.html )
For a solution Jet should consider the answers to Jet's question of
yesterday:
http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.1002/date/article-71.html
http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.1002/Date/article-70.html
Hope this helps,
Maarten
--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany
http://www.maartenbuis.nl
--------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/