Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Richard Williams <richardwilliams.ndu@gmail.com> |
To | statalist@hsphsun2.harvard.edu, statalist@hsphsun2.harvard.edu |
Subject | Re: st: Factor variable notation vs. hand made dummy vars |
Date | Mon, 06 Feb 2012 11:22:38 -0500 |
At 10:41 AM 2/6/2012, Brendan Halpin wrote:
To put the "why" back one step, the immediate reason is evident from the output | . logit for mpg d2-d5 | | note: d2 != 0 predicts failure perfectly | d2 dropped and 8 obs not used | | [...] | | . logit for mpg ib1.rep78 | | note: 1.rep78 != 0 predicts failure perfectly | 1.rep78 dropped and 2 obs not used | | note: 2.rep78 != 0 predicts failure perfectly | 2.rep78 dropped and 8 obs not used | | note: 5.rep78 omitted because of collinearity | | [...] You end up fitting different models on different data. The question is now why do the formulations behave differently, and which is the better default?
To clarify my last answer, my guess is that in the vast majority of cases it won't matter which approach you use. But, this particular example is problematic because of the very small category Ns and the perfect prediction issues. If you are in a situation where it matters, you may want to recode the problematic variable (e.g. combine categories to dichotomize it) or consider an alternative technique, such as -exlogistic-, which, as the manual says, "produces more-accurate inference in small samples because it does not depend on asymptotic results and exlogistic can better deal with one-way causation, such as the case where all females are observed to have a positive outcome."
------------------------------------------- Richard Williams, Notre Dame Dept of Sociology OFFICE: (574)631-6668, (574)631-6463 HOME: (574)289-5227 EMAIL: Richard.A.Williams.5@ND.Edu WWW: http://www.nd.edu/~rwilliam * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/