Peter Reece asked about generating a set of variables that deviate slightly and
randomly from a corresponding set of two-level categorical variables:
"Yn is a variable containing a pattern. Xn is egen generated data which must
contain the pattern but with slightly different proportions. Hence if Y1
contains 70 1's and 30 2's, then X1 might contain 68 1's and 32 2's. The
amount difference between X1 and Y1 would vary slightly each time X1 is
generated, and be randomly distributed. . .
"So for over 100 Y variables I need some way to have egen look at the
proportions of 1's and 2's in each one of those variables using 'fill' or
whatever, generate similar data for X, where Xn varies slightly in the
proportions of 1's and 2's in each Yn."
If I understand his request correctly, then the do-file below does what Peter asks. For
illustration in the do-file, the *y* variables all are 70% zero and 30% one, and I've
limited the number of *y* variables to 10 instead of over 100. The *x* variables
corresponding to each *y* variable maintains a similar-but-randomly-slightly-different
proportion of zeroes and ones. If Peter wants a formal distribution for the *x* variables
based upon the mean of the corresponding *y* variables, there is Joseph Hilbe's &
Walter Linde-Zwirble's random-number-generating suite (-findit rnd-) that can be used
in place of the generate byte x`i'=uniform()<=r(mean) command below.
Joseph Coveney
---------------------------begin reece.do------------------------------
clear
set more off
set obs 1000
set seed 20030127
forvalues i = 1/10 {
generate byte y`i'=_n>700
summarize y`i', meanonly
generate byte x`i'=uniform()<=r(mean)
summarize y`i' x`i'
}
exit
----------------------------end reece.do-------------------------------
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/