Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Richard Williams <richardwilliams.ndu@gmail.com> |
To | statalist@hsphsun2.harvard.edu, Statalist Statalist <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Including components of a summative score in regression |
Date | Mon, 30 Jul 2012 16:12:35 -0500 |
At 02:46 PM 7/30/2012, Donald Spady wrote:
Dear StatalistersI am doing some logistic regression analysis, some of the variables of which are made up of the values of other variables; e.g. N = A + B + C/D. Is it reasonable, or appropriate, to include A, B, C, or D in the equation if N is already in it.i.e.logistic X F G H N A B C D, where F G H are some variables, and N is made up of A B C D, but for some reason or other A B C D are desired to be in the equation.My impression is that statistical theory would say this is a no-no, largely because of collinearity; however, if I do it, sometimes I get a better 'fit' to the equation (using estat gof, group(10)).Thanks Donald Spady
I believe the improvements in fit stem from* N constrains the coefficients of A and B and C/D to be equal; adding A and B relaxes two of those constraints.
* The linear effects of C and D are not captured by N So sure, adding those vars can improve fit. I could be wrong, but I believe if you ran logistic X F G H A B C D C/Dyou would get the same fit as you are getting now. That might seem a little less convoluted to me. But I don't think what you are doing is inherently evil; you just have to understand what the parameters mean and why you get a better fit.
------------------------------------------- Richard Williams, Notre Dame Dept of Sociology OFFICE: (574)631-6668, (574)631-6463 HOME: (574)289-5227 EMAIL: Richard.A.Williams.5@ND.Edu WWW: http://www.nd.edu/~rwilliam * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/