Hi Jeff,
How's going? I happen to read this post. It seems the
method -impute- uses is out-of-date, and may give
invalid inferences from the imputed data. Does Stata
has a plan to implement multiple imputation?
Weihua
--- "Jeff Pitblado, StataCorp LP"
<[email protected]> wrote:
>
> Renzo Comolli <[email protected]> asks about
> the limit on the number of
> variables allowed by -impute-:
>
> > I know this behavior is strictly "at my own risk".
> Anyway I (copied with a
> > different name and) removed the limitation to 31
> variables in the impute.ado
> > It works with no waiting time at all even with 52
> variables.
> > I wonder whether StataCorp has been too risk
> averse when they now updated it
> > from version 3.1 to version 8 of the ado.
>
> > Anybody had similar experiences of removing the
> limitation?
> > From the explanation in the manual of what
> -impute- does, it is possible
> > that I could get away with so many variables
> because almost all of them
> > where dummies and therefore easy to order.
> (counting the categorical
> > variables before the dummy expansion I am way
> below 15)
>
> The -impute- command runs regressions by best-subset
> regression, looking at
> the pattern of missing values in the predictors. It
> is conceivable that
> -impute- must run a regression for each combinations
> of the predictor
> variables, depending upon the patter of missingness.
>
> In order to enumerate all best-subset combinations,
> -impute- looks at the 0's
> and 1's in the binary representation of a long
> integer. In Stata, a long
> integer contains 32 bits--one of which is used for
> the sign. Thus each of the
> remaining bits are used to identify whether to
> include a predictor variable in
> a given regression, and increasing this limit beyond
> 31 will not have a
> desirable result (even thought the modified -impute-
> will not exit with an
> error).
>
> To illustrate how -impute- determines which
> variables to include in a
> regression, suppose there are 3 predictors and that
> the pattern of missing
> values among them requires a regression for each
> combination. In this--albeit
> worst case scenario--there are 2^3 = 8 regressions
> to run. We can determine
> which predictor to include in a regression by
> looking at the binary
> representation of the regression index (starting
> from 0):
>
> integer (base 10) integer
> (binary)
> 0 000
> 1 001
> 2 010
> 3 011
> 4 100
> 5 101
> 6 110
> 7 111
>
> If the names of the predictor variables are x1 x2
> and x3, we can interpret the
> binary number like this
>
> x3 x2 x1
> -------------------------------------
> <digit> <digit> <digit>
>
> Thus 001 mean include x1, 011 means include x1 and
> x2, ...
>
> Given this implementation, there has to be a limit
> on how many predictors are
> allowed by the -impute- command before the generated
> -long integer- variable
> becomes automatically -recast- to a -float- or
> -double-, thus breaking the
> implementation.
>
> By increasing the limit, all variables beyond the
> first 31 (possibly fewer)
> will not be used in any of the regressions.
>
> One way to get around this limit would be to add an
> option to -impute-, say
> -nomissings()-, that will take a varlist. These
> variables will be assumed
> missing-value-free so that they could be present in
> all regressions.
>
> We will look into adding this as a future update.
>
> --Jeff
> [email protected]
> *
> * For searches and help try:
> *
> http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/