Tim Hale <[email protected]> writes,
> I am trying to figure out exactly what the -impute- command in Stata
> does to estimate missing values.
-impute y x1 x2 ..., gen(yhat)-= creates
yhat_j = y_j in observations j for which y_j<.
= prediction_j otherwise
Each prediction_j is the predicted value from a linear regression, said
linear regression using the subset of variables of x1, x2, ..., that do not
contain missing in observation j.
Consider the following dataset:
. list y x1 x2
+-------------+
| y x1 x2 |
|-------------|
1. | 1 2 3 |
2. | 4 5 6 |
3. | 5 5 8 |
4. | 5 6 6 |
5. | . 5 6 |
|-------------|
6. | . . 3 |
+-------------+
and the command
. impute y x1 x2, gen(yhat)
Then yhat in observation 5 would be based on a regression of y on x1 and x2,
because both x1 and x2 are not missing in observation 5. This amounts to
. regress y x1 x2
. predict prediction
. replace yhat = prediction in 5
The yhat in observation 6 would be based on a regression of y on x2, because
x1 in missing in observation 6. This amounts to
. regress y x2
. prediction prediction
. replace yhat = prediction in 6
-- Bill
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/