I will ignore "dropped" and focus on "recoded as missing value".
In all solutions, before you do this,
. set seed 280352
or whatever, for reproducibility.
This will do it approximately
. replace x = . if uniform() < 0.5
This will do it exactly
. gen long id = _n
. gen random = uniform()
. sort random
. replace x = . in 1/2500
. sort id
Ahmed Arif
I have a dataset with variables x and y. the dataset has 5000
observation. i want to generate another variable xmiss, such that 50%
of x values are dropped (recoded as missing value) randomly from the
dataset w/o affecting values of y. is there an easy way to do this?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/