Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Making sense of -mi passive-
From
Henrik Stovring <[email protected]>
To
statalist <[email protected]>
Subject
st: Making sense of -mi passive-
Date
Fri, 26 Nov 2010 14:25:21 +0100
Dear all,
In the manual on missing data in Stata 11, page 193, an example is given
on how one can use -egen- with the -mean()- function. However, I find it
very difficult to understand what Stata actually computes in this
situation, and in any case it is not what I would have expected. This
may be a feature, but perhaps others would be equally surprised, and
perhaps this would warrant at least a fair warning in the manual on what
is actually the result of running this command?
In short the problem is the following: Imagine the following dataset,
where IQ has been imputed in two datasets, for the three subjects where
it was missing:
. list id iq _mi*
+-------------------------------------------+
| id iq _mi_m _mi_id _mi_miss |
|-------------------------------------------|
1. | 1 . 0 8 1 |
2. | 2 . 0 9 1 |
3. | 3 . 0 10 1 |
4. | 4 108 0 1 0 |
5. | 5 117 0 2 0 |
|-------------------------------------------|
6. | 6 87 0 3 0 |
7. | 7 88 0 4 0 |
8. | 8 78 0 5 0 |
9. | 9 96 0 6 0 |
10. | 10 128 0 7 0 |
|-------------------------------------------|
11. | 1 123.1627 1 8 . |
12. | 2 100.7916 1 9 . |
13. | 3 103.7483 1 10 . |
14. | 4 108 1 1 . |
15. | 5 117 1 2 . |
|-------------------------------------------|
16. | 6 87 1 3 . |
17. | 7 88 1 4 . |
18. | 8 78 1 5 . |
19. | 9 96 1 6 . |
20. | 10 128 1 7 . |
|-------------------------------------------|
21. | 1 106.0351 2 8 . |
22. | 2 91.59879 2 9 . |
23. | 3 88.79485 2 10 . |
24. | 4 108 2 1 . |
25. | 5 117 2 2 . |
|-------------------------------------------|
26. | 6 87 2 3 . |
27. | 7 88 2 4 . |
28. | 8 78 2 5 . |
29. | 9 96 2 6 . |
30. | 10 128 2 7 . |
+-------------------------------------------+
Imagine that we now want the mean IQ in each imputed dataset (not
exactly relevant here, but we may for example want to restandardize IQ
to have a specific mean, while taking into account the imputed values -
in short we do to each imputed dataset what we would have done, had the
dataset been complete), and so (following the manual) we run:
. mi passive: egen meanlongpas=mean(iq)
(passive variable meanlongpas unregistered because not in m=0)
m=0:
m=1:
m=2:
(14 values of passive variable meanlongpas in m>0 updated to match
values in m=0)
.
. list id iq meanlongpas _mi*
+------------------------------------------------------+
| id iq meanlo~s _mi_m _mi_id _mi_miss |
|------------------------------------------------------|
1. | 4 108 100.2857 0 1 0 |
2. | 5 117 100.2857 0 2 0 |
3. | 6 87 100.2857 0 3 0 |
4. | 7 88 100.2857 0 4 0 |
5. | 8 78 100.2857 0 5 0 |
|------------------------------------------------------|
6. | 9 96 100.2857 0 6 0 |
7. | 10 128 100.2857 0 7 0 |
8. | 1 . 100.2857 0 8 1 |
9. | 2 . 100.2857 0 9 1 |
10. | 3 . 100.2857 0 10 1 |
|------------------------------------------------------|
11. | 4 108 100.2857 1 1 . |
12. | 5 117 100.2857 1 2 . |
13. | 6 87 100.2857 1 3 . |
14. | 7 88 100.2857 1 4 . |
15. | 8 78 100.2857 1 5 . |
|------------------------------------------------------|
16. | 9 96 100.2857 1 6 . |
17. | 10 128 100.2857 1 7 . |
18. | 1 123.1627 102.9703 1 8 . |
19. | 2 100.7916 102.9703 1 9 . |
20. | 3 103.7483 102.9703 1 10 . |
|------------------------------------------------------|
21. | 4 108 100.2857 2 1 . |
22. | 5 117 100.2857 2 2 . |
23. | 6 87 100.2857 2 3 . |
24. | 7 88 100.2857 2 4 . |
25. | 8 78 100.2857 2 5 . |
|------------------------------------------------------|
26. | 9 96 100.2857 2 6 . |
27. | 10 128 100.2857 2 7 . |
28. | 1 106.0351 98.84287 2 8 . |
29. | 2 91.59879 98.84287 2 9 . |
30. | 3 88.79485 98.84287 2 10 . |
+------------------------------------------------------+
What is peculiar is that the mean computed by -egen- is not constant
within the imputed datasets. It seems that for the records where IQ was
actually observed, -egen- returns the mean computed in original dataset,
whereas it returns the mean computed on all IQ values (observed AND
imputed values) for those records where IQ was missing before
imputation. Is this really meaningful? If so, I think the manual should
not use this as an introductory example without any warning.
What do you think? What am I missing here :-)?
Best,
Henrik
--
Henrik Støvring Department of Biostatistics
Associate professor University of Aarhus
[email protected] Bartholins Allé 2, Bldg 1261, 217
Phone +45 8942 6131 8000 Aarhus
Fax +45 8942 6140 Denmark
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/