Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Making sense of -mi passive-

From	Henrik Stovring <[email protected]>
To	statalist <[email protected]>
Subject	st: Making sense of -mi passive-
Date	Fri, 26 Nov 2010 14:25:21 +0100
Dear all,

In the manual on missing data in Stata 11, page 193, an example is given
on how one can use -egen- with the -mean()- function. However, I find it
very difficult to understand what Stata actually computes in this
situation, and in any case it is not what I would have expected. This
may be a feature, but perhaps others would be equally surprised, and
perhaps this would warrant at least a fair warning in the manual on what
is actually the result of running this command?

In short the problem is the following: Imagine the following dataset,
where IQ has been imputed in two datasets, for the three subjects where
it was missing:

. list id iq _mi*

     +-------------------------------------------+
     | id         iq   _mi_m   _mi_id   _mi_miss |
     |-------------------------------------------|
  1. |  1          .       0        8          1 |
  2. |  2          .       0        9          1 |
  3. |  3          .       0       10          1 |
  4. |  4        108       0        1          0 |
  5. |  5        117       0        2          0 |
     |-------------------------------------------|
  6. |  6         87       0        3          0 |
  7. |  7         88       0        4          0 |
  8. |  8         78       0        5          0 |
  9. |  9         96       0        6          0 |
 10. | 10        128       0        7          0 |
     |-------------------------------------------|
 11. |  1   123.1627       1        8          . |
 12. |  2   100.7916       1        9          . |
 13. |  3   103.7483       1       10          . |
 14. |  4        108       1        1          . |
 15. |  5        117       1        2          . |
     |-------------------------------------------|
 16. |  6         87       1        3          . |
 17. |  7         88       1        4          . |
 18. |  8         78       1        5          . |
 19. |  9         96       1        6          . |
 20. | 10        128       1        7          . |
     |-------------------------------------------|
 21. |  1   106.0351       2        8          . |
 22. |  2   91.59879       2        9          . |
 23. |  3   88.79485       2       10          . |
 24. |  4        108       2        1          . |
 25. |  5        117       2        2          . |
     |-------------------------------------------|
 26. |  6         87       2        3          . |
 27. |  7         88       2        4          . |
 28. |  8         78       2        5          . |
 29. |  9         96       2        6          . |
 30. | 10        128       2        7          . |
     +-------------------------------------------+

Imagine that we now want the mean IQ in each imputed dataset (not
exactly relevant here, but we may for example want to restandardize IQ
to have a specific mean, while taking into account the imputed values -
in short we do to each imputed dataset what we would have done, had the
dataset been complete), and so (following the manual) we run:

. mi passive: egen meanlongpas=mean(iq)
(passive variable meanlongpas unregistered because not in m=0)
m=0:
m=1:
m=2:
(14 values of passive variable meanlongpas in m>0 updated to match
values in m=0)

.
. list id iq meanlongpas _mi*

     +------------------------------------------------------+
     | id         iq   meanlo~s   _mi_m   _mi_id   _mi_miss |
     |------------------------------------------------------|
  1. |  4        108   100.2857       0        1          0 |
  2. |  5        117   100.2857       0        2          0 |
  3. |  6         87   100.2857       0        3          0 |
  4. |  7         88   100.2857       0        4          0 |
  5. |  8         78   100.2857       0        5          0 |
     |------------------------------------------------------|
  6. |  9         96   100.2857       0        6          0 |
  7. | 10        128   100.2857       0        7          0 |
  8. |  1          .   100.2857       0        8          1 |
  9. |  2          .   100.2857       0        9          1 |
 10. |  3          .   100.2857       0       10          1 |
     |------------------------------------------------------|
 11. |  4        108   100.2857       1        1          . |
 12. |  5        117   100.2857       1        2          . |
 13. |  6         87   100.2857       1        3          . |
 14. |  7         88   100.2857       1        4          . |
 15. |  8         78   100.2857       1        5          . |
     |------------------------------------------------------|
 16. |  9         96   100.2857       1        6          . |
 17. | 10        128   100.2857       1        7          . |
 18. |  1   123.1627   102.9703       1        8          . |
 19. |  2   100.7916   102.9703       1        9          . |
 20. |  3   103.7483   102.9703       1       10          . |
     |------------------------------------------------------|
 21. |  4        108   100.2857       2        1          . |
 22. |  5        117   100.2857       2        2          . |
 23. |  6         87   100.2857       2        3          . |
 24. |  7         88   100.2857       2        4          . |
 25. |  8         78   100.2857       2        5          . |
     |------------------------------------------------------|
 26. |  9         96   100.2857       2        6          . |
 27. | 10        128   100.2857       2        7          . |
 28. |  1   106.0351   98.84287       2        8          . |
 29. |  2   91.59879   98.84287       2        9          . |
 30. |  3   88.79485   98.84287       2       10          . |
     +------------------------------------------------------+

What is peculiar is that the mean computed by -egen- is not constant
within the imputed datasets. It seems that for the records where IQ was
actually observed, -egen- returns the mean computed in original dataset,
whereas it returns the mean computed on all IQ values (observed AND
imputed values) for those records where IQ was missing before
imputation. Is this really meaningful? If so, I think the manual should
not use this as an introductory example without any warning.

What do you think? What am I missing here :-)?

Best,

Henrik


-- 
Henrik Støvring			Department of Biostatistics
Associate professor            	University of Aarhus
[email protected]     	Bartholins Allé 2, Bldg 1261, 217
Phone +45 8942 6131            	8000 Aarhus
Fax +45 8942 6140              	Denmark
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Prev by Date: st: RE: getting the standard deviation of predictions
Next by Date: Re: st: RE: RE: Applying weights to Survey Data
Previous by thread: st: RE: getting the standard deviation of predictions
Next by thread: Re: st: Making sense of -mi passive-
Index(es):
- Date
- Thread