| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Re: simple way to create missing data that is "missing at random" from a small datset
From |
Suzy <[email protected]> |
To |
[email protected] |
Subject |
st: Re: simple way to create missing data that is "missing at random" from a small datset |
Date |
Fri, 24 Feb 2006 18:40:44 -0500 |
Thanks Maarten for providing me more detail on your command. I worked
with the constant and now have the correct proportion of missingness,
although I'm not sure what the implications are of the std dev and the
max values of p (.549). Now that I better understand what the command is
doing, I will continue to work with the values and look at the outcomes.
I really appreciate your help!
. gen p = invlogit( -8 +.1*age )
. sum p
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
p | 332 .0999268 .113432 .0054863 .549834
. replace bmi = . if uniform() < p
(27 real changes made, 27 to missing)
Maarten buis wrote:
Suzy:
No problem, but if you find my reply puzzeling than chances are that someone else on statalist
might find it puzzeling too, so I also sent my reply (and your full question underneath) to the
statalist.
The variable p is the probability of missingness, so the mean of p should be .1 if you want
apporximately 10% missingness. Your mean is .99, so most people will be made missing. -invlogit-
transforms a linear function of "explanatory variables" (in yourcase .1*age) to lie between zero
and one according to 1/(1+exp{-xb}), so the values you plug in (in your case .1 for age and 0 for
the constant) are "logistic regression coefficients". I would play around with values of the
constant so that you get a mean p of about .1 (the more negative the constant the lower the
probability), For instance look at the mean of p if you do -gen p =invlogit(-10 + .1*age)-
Afterwards I would look if there is enough variation in the values of p. If the value of p is
approximately constant than the influence of age on the probability of missingness is probably not
strong enough to show up in your simulations. If p is approximately constant you should increase
the parameter of age. This might than mess up the mean probability of missingness a bit, so than
it would be good to check if the mean probability of missingness is still close to .1
HTH,
Maarten
--- Suzy <[email protected]> wrote:
Dear Maarten:
Hope you don't mind the direct e-mail. I tried your code based on my
dataset and what I thought I should do and all of my BMI observations
went missing rather than say 5-10%. I have obviously done something
wrong with it. I'm hoping you can help. I would like about 10% of the
BMI variable to be missing. I want the missingness to be associated with
older age, but not dependent on the value of BMI - thus hopefully
satisfying the MAR assumption.
I've included the summary stats of the variables, the code you provided
(I modified it somewhat) and the result...
can you see what I did wrong??
summarize
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
sex | 332 .4849398 .5005275 0 1
race | 332 .3253012 .4691944 0 1
age | 332 52.06024 12.6857 28 82
fhdm | 332 .3373494 .4735189 0 1
bmi | 332 30.98795 6.18837 18 48
-------------+--------------------------------------------------------
dmcat | 332 .2771084 .4482461 0 1
. gen p = invlogit(.1*age)
. sum p
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
p | 332 .9894261 .0121324 .9426758 .9997254
. replace bmi = . if uniform() < p
(332 real changes made, 332 to missing)
. summarize
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
sex | 332 .4849398 .5005275 0 1
race | 332 .3253012 .4691944 0 1
age | 332 52.06024 12.6857 28 82
fhdm | 332 .3373494 .4735189 0 1
bmi | 0
-------------+--------------------------------------------------------
dmcat | 332 .2771084 .4482461 0 1
p | 332 .9894261 .0121324 .9426758 .9997254
-----------------------------------------
between 1/2/2006 and 31/3/2006 I will be
visiting the UCLA, during this time the
best way to reach me is by email
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z214
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
___________________________________________________________
Win a BlackBerry device from O2 with Yahoo!. Enter now. http://www.yahoo.co.uk/blackberry
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/