Dear Statalisters,
I would like to simulate data for a linear probability model with
response variable y, regressor(s) x, and known coefficients a and b.
If I wanted to simulate data from a logistic model, I could follow the
procedure suggested helpfully by Al Feiveson on this listserv (Nov
11'02):
generate z = a + b*x
generate p = exp(z)/(1+exp(z))
generate y = uniform()<=p
But I'm stumped as to how to go about simulating data from a LPM. I
can't simply draw a random error term to generate the response variable
y because of the heteroskedasticity problem. For efficient estimation of
the LPM, Goldberger suggests a weighted least squares procedure that
involves (1) estimating by OLS, (2) computing yhat(1-yhat), (3) using
weighted least squares with the weights w=sqrt(yhat(1-yhat)], and (4)
regressing y/x and x/w.
Have any other Statalisters encountered this problem before?
Many thanks,
Alex
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/